CN102609418B - Data quality grade judging method - Google Patents

Data quality grade judging method Download PDF

Info

Publication number
CN102609418B
CN102609418B CN201110023938.1A CN201110023938A CN102609418B CN 102609418 B CN102609418 B CN 102609418B CN 201110023938 A CN201110023938 A CN 201110023938A CN 102609418 B CN102609418 B CN 102609418B
Authority
CN
China
Prior art keywords
data
proper vector
target data
character
character set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110023938.1A
Other languages
Chinese (zh)
Other versions
CN102609418A (en
Inventor
齐志英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJINGDUXIU TECHNOLOGY Co Ltd
Original Assignee
BEIJINGDUXIU TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJINGDUXIU TECHNOLOGY Co Ltd filed Critical BEIJINGDUXIU TECHNOLOGY Co Ltd
Priority to CN201110023938.1A priority Critical patent/CN102609418B/en
Publication of CN102609418A publication Critical patent/CN102609418A/en
Application granted granted Critical
Publication of CN102609418B publication Critical patent/CN102609418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a data quality grade judging method, which comprises the following steps: transmitting an obtained objective data group to a server; extracting a characteristic vector of each objective data in the objective data group and performing the character conversion on the characteristic vector so as to obtain character set data corresponding to each characteristic vector; examining the character set data according to a grade standard stored in the server so as to obtain a quality grade code of each characteristic vector; and feeding back each objective data and the corresponding quality grade code to a user. According to the method, the quality of the objective data group can be examined by the grade standard stored in the server, so that the quality grade code corresponding to the objective data group is obtained and is fed back to the user; the method is rapid in examination speed and high in accuracy; and the user can carry out a corresponding processing according to each objective data and the corresponding quality grade code.

Description

Data quality grade determination methods
Technical field
What the present invention relates to is a kind of determination methods of the quality of data, particularly relates to a kind of configurable data quality grade determination methods based on algorithm.
Background technology
Along with the development of infotech; various information emerges in multitude; in practical application, user carries out in the retrieval of data message or the process of typing; often can find that the attribute of some data messages is imperfect; or containing error message in some data message; as the nonstandard expression etc. of the mistake of phonetically similar word/nearly sound word, the likeness in form different mistake of meaning and some special symbols; these Incomplete informations and error message can cause puzzlement to user; the demand of the person that cannot meet acquisition of information; and the accuracy of retrieval or typing is reduced, thus affect the use of user.This just makes the judgement retrieving information of input or the data message of typing being carried out to data message quality seem particularly important, namely Quality estimation is carried out to retrieving information or entry information, obtain the quality scale coding of the quality of data, can the autotelic data to different quality grade encoding to modify targetedly and perfect to make user.
In view of above-mentioned defect, creator of the present invention obtains this creation finally through long research and practice.
Summary of the invention
The object of the invention is to, for Problems existing in data information retrieval and Input Process, provide a kind of data quality grade determination methods.
For achieving the above object, the technical solution used in the present invention is, provide a kind of data quality grade determination methods, the method comprises the following steps:
The target data group of acquisition is sent to server;
The proper vector of each target data in target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding;
Each described character set data is tested according to the class criteria stored in described server, obtains the quality scale coding of each described proper vector; And
The quality scale coding of each described target data and correspondence thereof is returned user.
Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information is empty, then illustrate that the target data that this proper vector is tackled is imperfect.
Wherein, described class criteria is the structuring character set judging target data group quality scale, the class criteria that dissimilar proper vector is corresponding different, as represented the corresponding time class criteria of the proper vector of time, represent the corresponding title class criteria of proper vector etc. of title, but not as limit.
During enforcement, the present invention is further comprising the steps of: the class criteria of specifying inspection according to the proper vector of each described target data group in advance, to improve speed and the accuracy of data quality grade inspection.
During enforcement, the present invention is further comprising the steps of: if not according to the proper vector specified level standard in advance of each described target data group, then be defaulted as default setting, the maximum class criteria of the automatic range of choice of described server is tested, to improve the versatility of character set data inspection.
During enforcement, the size of the similarity that described quality scale coding mates with described class criteria according to each described character set data arranges.
Beneficial effect of the present invention: the present invention can be tested to the quality of target data group by the class criteria stored in server, obtains the quality scale coding corresponding to target data group, and returns to user; Its inspection speed is fast, and accuracy is high; User can process accordingly according to the quality scale coding of each target data and correspondence thereof.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of data quality grade determination methods of the present invention;
Fig. 2 is the process flow diagram of the first embodiment of data quality grade determination methods of the present invention;
Fig. 3 is the process flow diagram of the second embodiment of data quality grade determination methods of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.
Data quality grade determination methods of the present invention is given target data group, the data area of different quality rank is oriented according to proper vector coupling, then can position renewal to the data of different quality rank, thus provide basis for the optimization process of late time data information.
Refer to Fig. 1, the invention provides a kind of data quality grade determination methods, the method comprises the following steps:
The target data group of acquisition is sent to server; Wherein, described target data group refers to the data needing to carry out rank judgement;
The proper vector of each target data in target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding;
Each described character set data is tested according to the class criteria stored in described server, obtains the quality scale coding of each described proper vector; And
The quality scale coding of each described target data and correspondence thereof is returned user.
During enforcement, the size of the similarity that described quality scale coding mates with described class criteria according to each described character set data arranges.
Refer to shown in Fig. 2, its first embodiment provided when implementing for data quality grade determination methods of the present invention, it comprises the following steps:
S101: the target data group of acquisition is sent to server; Wherein, described target data group refers to the data needing to carry out rank judgement;
S102: the proper vector of each target data in the target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information is empty, then illustrate that the target data that this proper vector is tackled is imperfect;
S103: the class criteria of specifying inspection according to the proper vector of each described target data group in advance, to improve speed and the accuracy of data quality grade inspection;
S104: tested according to the class criteria stored in described server by each described character set data, obtains the quality scale coding of each described proper vector; Wherein, described class criteria is the structuring character set judging target data group quality scale, the class criteria that dissimilar proper vector is corresponding different, as represented the corresponding time class criteria of the proper vector of time, represent the corresponding title class criteria of proper vector etc. of title, but not as limit; And
S105: the quality scale coding of each described target data and correspondence thereof is returned user.
Preferably, step S103 can perform before step S101 or S102.
The present embodiment is when implementing, in the process of described character set data being tested according to class criteria corresponding to described proper vector, can first split character set data, fractionation mode can adopt character is split, split by blank character and character number or with the character of described dictionary of information for standard splits, but fractionation mode is not limited thereto, to improve the inspection speed of described character set data.
The present embodiment is when implementing, in the process that described character set data is tested, one by one each described character set data is mated according to the class criteria of its correspondence, if mate completely, namely similarity is 100%, then show that the quality scale of the target data corresponding to this character set data is the highest, be arranged in order, obtain the quality scale coding of the target data group corresponding to described character set data.
Refer to Fig. 3, the second embodiment that the present invention provides when implementing comprises the following steps:
S201: the target data group of acquisition is sent to server; Wherein, described target data group refers to the data needing to carry out rank judgement;
S202: the proper vector of each target data in the target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information is empty, then illustrate that the target data that this proper vector is tackled is imperfect;
S203: if not according to the proper vector specified level standard in advance of each described target data group, be then defaulted as default setting, the maximum class criteria of the automatic range of choice of described server is tested;
S204: tested according to the class criteria stored in described server by each described character set data, obtains the quality scale coding of each described proper vector; Wherein, described class criteria is the structuring character set judging target data group quality scale, the class criteria that dissimilar proper vector is corresponding different, as represented the corresponding time class criteria of the proper vector of time, represent the corresponding title class criteria of proper vector etc. of title, but not as limit; And
S205: the quality scale coding of each described target data and correspondence thereof is returned user.
Preferably, step S203 can perform before step S201 or S202.
The present embodiment is when implementing, in the process that described character set data is tested, one by one each described character set data is mated according to the class criteria of its correspondence, if mate completely, namely similarity is 100%, then show that the quality scale of the target data corresponding to this character set data is the highest, be arranged in order, obtain the quality scale coding of the target data group corresponding to described character set data.
The present embodiment is when implementing, in the process of described character set data being tested according to class criteria corresponding to described proper vector, can first split character set data, fractionation mode can adopt character is split, split by blank character and character number or with the character of described dictionary of information for standard splits, but fractionation mode is not limited thereto, to improve the inspection speed of described character set data.
In the process that described character set data is tested of the present invention and embodiment, one by one each described character set data is mated according to the class criteria of its correspondence, if mate completely, namely similarity is 100%, then show that the quality scale of the target data corresponding to this character set data is the highest, be arranged in order, obtain the quality scale coding of the target data group corresponding to described character set data.
Result of the present invention returns to user, to facilitate user to process described target data group according to quality scale coding, comprises and retains prime information or carry out data correction etc. to the target data group of low quality level coding.
The present invention and embodiment, can by generating algorithm configuration files such as proper vector, rank numbering, class criteria, algorithm statements when implementing, so that master routine reads this algorithm configuration file with realization to the control of data and process, and generator program control data; Wherein, proper vector refers to the attribute of target data group; Rank numbering refers to target data group, the mark of the good bad degree of the quality of data; Class criteria refers to the structuring character set judging target data group quality scale, the class criteria that different characteristic vector is corresponding different;
Secondly algorithm statement comprises: the class criteria first matching target data group correspondence according to proper vector, carries out quality inspection with class criteria to target data group, and finally, the target data returning different stage is trooped conjunction.
Wherein, program control data refers to the data through program construction process, comprising: proper vector, class criteria, algorithm handle.
Such as: if target data group is present in tables of data, attribute can be regarded as field name, i.e. proper vector; Return set and be equivalent to the good and bad extent and scope of the corresponding result set of catching respectively;
Attribute 1| attribute 2| attribute 3|
Rank 1| rank 2| rank 3|
Second step: read algorithm configuration file and form program control data;
Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, but not be limited; Those of ordinary skill in the art can modify to the technical scheme of invention, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of embodiment of the present invention technical scheme.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention also comprises these change and modification.

Claims (2)

1. a data quality grade determination methods, is characterized in that, comprises the following steps:
The target data group of acquisition is sent to server;
The proper vector of each target data in target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding; The class criteria of inspection is specified in advance according to the proper vector of each described target data group, if not according to the proper vector specified level standard in advance of each described target data group, then be defaulted as default setting, the maximum class criteria of the automatic range of choice of described server is tested;
Each described character set data is tested according to the class criteria stored in described server, obtains the quality scale coding of each described proper vector;
And the quality scale coding of each described target data and correspondence thereof is returned user;
Testing in process to described data, first character set data is split, fractionation mode comprise character is split, split by blank character and character number or with the character of dictionary of information for standard splits.
2. data quality grade determination methods according to claim 1, is characterized in that, the size of the similarity that described quality scale coding mates with described class criteria according to each described character set data arranges.
CN201110023938.1A 2011-01-21 2011-01-21 Data quality grade judging method Active CN102609418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110023938.1A CN102609418B (en) 2011-01-21 2011-01-21 Data quality grade judging method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110023938.1A CN102609418B (en) 2011-01-21 2011-01-21 Data quality grade judging method

Publications (2)

Publication Number Publication Date
CN102609418A CN102609418A (en) 2012-07-25
CN102609418B true CN102609418B (en) 2015-02-04

Family

ID=46526800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110023938.1A Active CN102609418B (en) 2011-01-21 2011-01-21 Data quality grade judging method

Country Status (1)

Country Link
CN (1) CN102609418B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1698111A (en) * 2001-12-05 2005-11-16 皇家飞利浦电子股份有限公司 Method and apparatus for verifying the integrity of system data
CN101576893A (en) * 2008-05-09 2009-11-11 北京世纪拓远软件科技发展有限公司 Method and system for analyzing data quality

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100576207C (en) * 2007-05-29 2009-12-30 北大方正集团有限公司 Remove the method for repeating objects based on metadata

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1698111A (en) * 2001-12-05 2005-11-16 皇家飞利浦电子股份有限公司 Method and apparatus for verifying the integrity of system data
CN101576893A (en) * 2008-05-09 2009-11-11 北京世纪拓远软件科技发展有限公司 Method and system for analyzing data quality

Also Published As

Publication number Publication date
CN102609418A (en) 2012-07-25

Similar Documents

Publication Publication Date Title
CN103635963B (en) Language model across languages initialization
CN108959256B (en) Short text generation method and device, storage medium and terminal equipment
CN108268600B (en) AI-based unstructured data management method and device
US20160328650A1 (en) Mining Forums for Solutions to Questions
CN104598527A (en) Voice search method and device
CN109271630B (en) Intelligent labeling method and device based on natural language processing
CN110704547B (en) Relation extraction data generation method, model and training method based on neural network
CN107993636B (en) Recursive neural network-based music score modeling and generating method
CN113641805B (en) Method for acquiring structured question-answering model, question-answering method and corresponding device
CN111143551A (en) Text preprocessing method, classification method, device and equipment
CN107748744A (en) Method and device for establishing drawing box knowledge base
CN115525768A (en) Visual construction method and device for domain knowledge graph
CN112906393A (en) Meta learning-based few-sample entity identification method
CN105488471B (en) A kind of font recognition methods and device
CN110276081B (en) Text generation method, device and storage medium
CN114254658A (en) Method, device, equipment and storage medium for generating translation evaluation training data
CN110209780A (en) A kind of question template generation method, device, server and storage medium
CN117216226A (en) Knowledge positioning method, device, storage medium and equipment
CN102609418B (en) Data quality grade judging method
CN117171296A (en) Information acquisition method and device and electronic equipment
CN104504104A (en) Picture material processing method and device for search engine, and search engine
CN117112902A (en) Off-line reinforcement learning commodity recommendation system based on multi-mode contrast learning
CN111782789A (en) Intelligent question and answer method and system
CN116909435A (en) Data processing method and device, electronic equipment and storage medium
CN113128216A (en) Language identification method, system and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 100085 2 floor 1, four street, Haidian District, Beijing.

Patentee after: BeijingDuxiu Technology Co., Ltd.

Address before: 100085 C-710, Jiahua building, nine, Shang di San Jie, Haidian District, Beijing.

Patentee before: BeijingDuxiu Technology Co., Ltd.