CN102609418A - Data quality grade judging method - Google Patents

Data quality grade judging method Download PDF

Info

Publication number
CN102609418A
CN102609418A CN2011100239381A CN201110023938A CN102609418A CN 102609418 A CN102609418 A CN 102609418A CN 2011100239381 A CN2011100239381 A CN 2011100239381A CN 201110023938 A CN201110023938 A CN 201110023938A CN 102609418 A CN102609418 A CN 102609418A
Authority
CN
China
Prior art keywords
data
quality
rank
target data
proper vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011100239381A
Other languages
Chinese (zh)
Other versions
CN102609418B (en
Inventor
齐志英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJINGDUXIU TECHNOLOGY Co Ltd
Original Assignee
BEIJINGDUXIU TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJINGDUXIU TECHNOLOGY Co Ltd filed Critical BEIJINGDUXIU TECHNOLOGY Co Ltd
Priority to CN201110023938.1A priority Critical patent/CN102609418B/en
Publication of CN102609418A publication Critical patent/CN102609418A/en
Application granted granted Critical
Publication of CN102609418B publication Critical patent/CN102609418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a data quality grade judging method, which comprises the following steps: transmitting an obtained objective data group to a server; extracting a characteristic vector of each objective data in the objective data group and performing the character conversion on the characteristic vector so as to obtain character set data corresponding to each characteristic vector; examining the character set data according to a grade standard stored in the server so as to obtain a quality grade code of each characteristic vector; and feeding back each objective data and the corresponding quality grade code to a user. According to the method, the quality of the objective data group can be examined by the grade standard stored in the server, so that the quality grade code corresponding to the objective data group is obtained and is fed back to the user; the method is rapid in examination speed and high in accuracy; and the user can carry out a corresponding processing according to each objective data and the corresponding quality grade code.

Description

Quality of data rank determination methods
Technical field
What the present invention relates to is a kind of determination methods of the quality of data, relates in particular to a kind of configurable quality of data rank determination methods based on algorithm.
Background technology
Continuous development along with infotech; The various information emerge in multitude, the user carries out finding that through regular meeting the attribute of some data messages is imperfect in the process of retrieval or typing of data message in the practical application; Or contain error message in some data message; Like the expression lack of standardization of different wrong and some special symbols of mistake, the likeness in form meaning of phonetically similar word/nearly sound word etc., these imperfect information and error message meeting cause puzzlement to the user, can't satisfy the demand that information obtains taker; And the accuracy of retrieval or typing is reduced, thereby influence user's use.This seems particularly important with regard to the judgement that makes data message to the retrieving information of input or typing carry out the data message quality; Promptly retrieving information or entry information are carried out the quality judgement; Obtain the quality scale coding of the quality of data, so that the user can autotelic data to the different quality grade encoding made amendment targetedly and perfect.
In view of above-mentioned defective, creator of the present invention is through research and practice have obtained this creation finally for a long time.
Summary of the invention
The objective of the invention is to,, a kind of quality of data rank determination methods is provided to the problem that exists in data information retrieval and the typing process.
For realizing above-mentioned purpose, the technical scheme that the present invention adopts is that a kind of quality of data rank determination methods is provided, and this method may further comprise the steps:
The target data crowd who obtains is sent to server;
Extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data;
Each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; And
Each described target data and corresponding quality scale coding thereof are returned the user.
Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect.
Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit.
During enforcement, the present invention is further comprising the steps of: specify the rank standard of check in advance according to each described target data crowd's proper vector, to improve the speed and the accuracy of quality of data level checking.
During enforcement; The present invention is further comprising the steps of: if not according to each described target data crowd's proper vector specified level standard in advance; Then be defaulted as default setting; The maximum rank standard of the automatic range of choice of described server is tested, to improve the versatility of character set data check.
During enforcement, described quality scale coding is arranged according to the size of the similarity of each described character set data and described rank matches criteria.
Beneficial effect of the present invention: the present invention can test to target data crowd's quality through the rank standard of storing in the server, obtains the pairing quality scale coding of target data crowd, and returns to the user; Its check speed is fast, and accuracy is high; The user can handle according to each target data and corresponding quality scale coding thereof accordingly.
Description of drawings
Fig. 1 is the process flow diagram of quality of data rank determination methods of the present invention;
Fig. 2 is the process flow diagram of first embodiment of quality of data rank determination methods of the present invention;
Fig. 3 is the process flow diagram of second embodiment of quality of data rank determination methods of the present invention.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.
Quality of data rank determination methods of the present invention is given target data crowd; Go out other data area of different quality level according to the proper vector Matching Location; Can position renewal to other data of different quality level then, thus for the optimization process of late time data information provide the basis.
See also Fig. 1, the invention provides a kind of quality of data rank determination methods, this method may further comprise the steps:
The target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;
Extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data;
Each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; And
Each described target data and corresponding quality scale coding thereof are returned the user.
During enforcement, described quality scale coding is arranged according to the size of the similarity of each described character set data and described rank matches criteria.
See also shown in Figure 2ly, it is first embodiment that quality of data rank determination methods of the present invention provides when implementing, and it may further comprise the steps:
S101: the target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;
S102: extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect;
S103: specify the rank standard of check in advance according to each described target data crowd's proper vector, to improve the speed and the accuracy of quality of data level checking;
S104: each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit; And
S105: each described target data and corresponding quality scale coding thereof are returned the user.
Preferably, step S103 can carry out before step S101 or S102.
Present embodiment is when implementing; In the process of described character set data being tested according to described proper vector corresponding grade standard; Can split character set data earlier; The fractionation mode can adopt character is split, splits or be that standard splits with the character of described dictionary of information by blank character and character number, but the fractionation mode is not limited thereto, to improve the check speed of described character set data.
Present embodiment is when implementing; In the process that described character set data is tested, one by one each described character set data is mated according to its corresponding grade standard, if coupling fully; Be that similarity is 100%; The quality scale that then shows the pairing target data of this character set data is the highest, is arranged in order, and obtains the pairing target data crowd's of described character set data quality scale coding.
See also Fig. 3, second embodiment that the present invention provides when implementing may further comprise the steps:
S201: the target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;
S202: extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect;
S203: if not according to each described target data crowd's proper vector specified level standard in advance, then be defaulted as default setting, the maximum rank standard of the automatic range of choice of described server is tested;
S204: each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit; And
S205: each described target data and corresponding quality scale coding thereof are returned the user.
Preferably, step S203 can carry out before step S201 or S202.
Present embodiment is when implementing; In the process that described character set data is tested, one by one each described character set data is mated according to its corresponding grade standard, if coupling fully; Be that similarity is 100%; The quality scale that then shows the pairing target data of this character set data is the highest, is arranged in order, and obtains the pairing target data crowd's of described character set data quality scale coding.
Present embodiment is when implementing; In the process of described character set data being tested according to described proper vector corresponding grade standard; Can split character set data earlier; The fractionation mode can adopt character is split, splits or be that standard splits with the character of described dictionary of information by blank character and character number, but the fractionation mode is not limited thereto, to improve the check speed of described character set data.
In the process that described character set data is tested of the present invention and embodiment; One by one each described character set data is mated according to its corresponding grade standard; If mate fully, promptly similarity is 100%, shows that then the quality scale of the pairing target data of this character set data is the highest; Be arranged in order, obtain the pairing target data crowd's of described character set data quality scale coding.
Result of the present invention returns to the user, described target data crowd is handled according to the quality scale coding to make things convenient for the user, comprises that the target data crowd who keeps prime information or quality level is encoded carries out data correction etc.
The present invention and embodiment can be with generating algorithm configuration files such as proper vector, rank numbering, rank standard, algorithm statements, so that master routine reads this algorithm configuration file to realize control and the processing to data, generator program control data when implementing; Wherein, proper vector is meant target data crowd's attribute; The rank numbering is meant target data crowd, the sign of the good bad degree of the quality of data; The rank standard is meant the structuring character set of judging target data crowd quality scale, the corresponding different rank standard of different characteristic vector;
Algorithm statement comprises: at first match target data crowd corresponding grade standard according to proper vector, secondly, with the rank standard target data crowd is carried out quality inspection, at last, the target data of returning different stage is trooped and is closed.
Wherein, program control data is meant the data of handling through program construction, comprising: proper vector, rank standard, algorithm handle.
For example: if the target data crowd is present in the tables of data, attribute can be regarded as field title, i.e. proper vector; Return set and be equivalent to the good and bad extent and scope of the corresponding result set of catching respectively;
Attribute 1| attribute 2| attribute 3|
Rank 1| rank 2| rank 3|
Second step: read the algorithm configuration file and form program control data;
What should explain at last is: above embodiment is only in order to explaining technical scheme of the present invention, but not limits it; Those of ordinary skill in the art can make amendment to the technical scheme of invention, perhaps part technical characterictic wherein is equal to replacement; And these are revised or replacement, do not make the spirit and the scope of the essence disengaging embodiment of the invention technical scheme of relevant art scheme.Like this, belong within the scope of claim of the present invention and equivalent technologies thereof if of the present invention these are revised with modification, then the present invention also comprises these changes and modification interior.

Claims (4)

1. a quality of data rank determination methods is characterized in that, may further comprise the steps:
The target data crowd who obtains is sent to server;
Extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data;
Each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; And
Each described target data and corresponding quality scale coding thereof are returned the user.
2. quality of data rank determination methods according to claim 1 is characterized in that, and is further comprising the steps of: the rank standard of specifying check according to each described target data crowd's proper vector in advance.
3. quality of data rank determination methods according to claim 1; It is characterized in that; Further comprising the steps of: if not according to each described target data crowd's proper vector specified level standard in advance; Then be defaulted as default setting, the maximum rank standard of the automatic range of choice of described server is tested.
4. according to claim 2 or 3 described quality of data rank determination methods, it is characterized in that described quality scale coding is arranged according to the size of the similarity of each described character set data and described rank matches criteria.
CN201110023938.1A 2011-01-21 2011-01-21 Data quality grade judging method Active CN102609418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110023938.1A CN102609418B (en) 2011-01-21 2011-01-21 Data quality grade judging method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110023938.1A CN102609418B (en) 2011-01-21 2011-01-21 Data quality grade judging method

Publications (2)

Publication Number Publication Date
CN102609418A true CN102609418A (en) 2012-07-25
CN102609418B CN102609418B (en) 2015-02-04

Family

ID=46526800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110023938.1A Active CN102609418B (en) 2011-01-21 2011-01-21 Data quality grade judging method

Country Status (1)

Country Link
CN (1) CN102609418B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1698111A (en) * 2001-12-05 2005-11-16 皇家飞利浦电子股份有限公司 Method and apparatus for verifying the integrity of system data
CN101286156A (en) * 2007-05-29 2008-10-15 北大方正集团有限公司 Method for removing repeated object based on metadata
CN101576893A (en) * 2008-05-09 2009-11-11 北京世纪拓远软件科技发展有限公司 Method and system for analyzing data quality

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1698111A (en) * 2001-12-05 2005-11-16 皇家飞利浦电子股份有限公司 Method and apparatus for verifying the integrity of system data
CN101286156A (en) * 2007-05-29 2008-10-15 北大方正集团有限公司 Method for removing repeated object based on metadata
CN101576893A (en) * 2008-05-09 2009-11-11 北京世纪拓远软件科技发展有限公司 Method and system for analyzing data quality

Also Published As

Publication number Publication date
CN102609418B (en) 2015-02-04

Similar Documents

Publication Publication Date Title
CN112199375B (en) Cross-modal data processing method and device, storage medium and electronic device
US10460029B2 (en) Reply information recommendation method and apparatus
US20200234002A1 (en) Optimization techniques for artificial intelligence
US11915104B2 (en) Normalizing text attributes for machine learning models
CN104933152A (en) Named entity recognition method and device
CN106202380B (en) Method and system for constructing classified corpus and server with system
CN103593412B (en) A kind of answer method and system based on tree structure problem
CN111026886A (en) Multi-round dialogue processing method for professional scene
CN107004141A (en) To the efficient mark of large sample group
CN113536081B (en) Data center data management method and system based on artificial intelligence
CN110874536B (en) Corpus quality evaluation model generation method and double-sentence pair inter-translation quality evaluation method
Zarisheva et al. Dialog act annotation for twitter conversations
CN116049345B (en) Document-level event joint extraction method and system based on bidirectional event complete graph
CN112906393A (en) Meta learning-based few-sample entity identification method
CN114282513A (en) Text semantic similarity matching method and system, intelligent terminal and storage medium
CN109063772A (en) A kind of image individuation semantic analysis, device and equipment based on deep learning
CN111597336A (en) Processing method and device of training text, electronic equipment and readable storage medium
CN102609418A (en) Data quality grade judging method
CN115438645A (en) Text data enhancement method and system for sequence labeling task
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal
CN115794997A (en) Enterprise matching degree processing method and device based on enterprise labels
CN114547391A (en) Message auditing method and device
CN107608955B (en) Inter-translation method and device for named entities in Hanzang
CN116992874B (en) Text quotation auditing and tracing method, system, device and storage medium
CN116149258B (en) Numerical control machine tool code generation method based on multi-mode information and related equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 100085 2 floor 1, four street, Haidian District, Beijing.

Patentee after: BeijingDuxiu Technology Co., Ltd.

Address before: 100085 C-710, Jiahua building, nine, Shang di San Jie, Haidian District, Beijing.

Patentee before: BeijingDuxiu Technology Co., Ltd.

CP02 Change in the address of a patent holder