CN102609418A - Data quality grade judging method - Google Patents
Data quality grade judging method Download PDFInfo
- Publication number
- CN102609418A CN102609418A CN2011100239381A CN201110023938A CN102609418A CN 102609418 A CN102609418 A CN 102609418A CN 2011100239381 A CN2011100239381 A CN 2011100239381A CN 201110023938 A CN201110023938 A CN 201110023938A CN 102609418 A CN102609418 A CN 102609418A
- Authority
- CN
- China
- Prior art keywords
- data
- quality
- rank
- target data
- proper vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a data quality grade judging method, which comprises the following steps: transmitting an obtained objective data group to a server; extracting a characteristic vector of each objective data in the objective data group and performing the character conversion on the characteristic vector so as to obtain character set data corresponding to each characteristic vector; examining the character set data according to a grade standard stored in the server so as to obtain a quality grade code of each characteristic vector; and feeding back each objective data and the corresponding quality grade code to a user. According to the method, the quality of the objective data group can be examined by the grade standard stored in the server, so that the quality grade code corresponding to the objective data group is obtained and is fed back to the user; the method is rapid in examination speed and high in accuracy; and the user can carry out a corresponding processing according to each objective data and the corresponding quality grade code.
Description
Technical field
What the present invention relates to is a kind of determination methods of the quality of data, relates in particular to a kind of configurable quality of data rank determination methods based on algorithm.
Background technology
Continuous development along with infotech; The various information emerge in multitude, the user carries out finding that through regular meeting the attribute of some data messages is imperfect in the process of retrieval or typing of data message in the practical application; Or contain error message in some data message; Like the expression lack of standardization of different wrong and some special symbols of mistake, the likeness in form meaning of phonetically similar word/nearly sound word etc., these imperfect information and error message meeting cause puzzlement to the user, can't satisfy the demand that information obtains taker; And the accuracy of retrieval or typing is reduced, thereby influence user's use.This seems particularly important with regard to the judgement that makes data message to the retrieving information of input or typing carry out the data message quality; Promptly retrieving information or entry information are carried out the quality judgement; Obtain the quality scale coding of the quality of data, so that the user can autotelic data to the different quality grade encoding made amendment targetedly and perfect.
In view of above-mentioned defective, creator of the present invention is through research and practice have obtained this creation finally for a long time.
Summary of the invention
The objective of the invention is to,, a kind of quality of data rank determination methods is provided to the problem that exists in data information retrieval and the typing process.
For realizing above-mentioned purpose, the technical scheme that the present invention adopts is that a kind of quality of data rank determination methods is provided, and this method may further comprise the steps:
The target data crowd who obtains is sent to server;
Extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data;
Each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; And
Each described target data and corresponding quality scale coding thereof are returned the user.
Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect.
Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit.
During enforcement, the present invention is further comprising the steps of: specify the rank standard of check in advance according to each described target data crowd's proper vector, to improve the speed and the accuracy of quality of data level checking.
During enforcement; The present invention is further comprising the steps of: if not according to each described target data crowd's proper vector specified level standard in advance; Then be defaulted as default setting; The maximum rank standard of the automatic range of choice of described server is tested, to improve the versatility of character set data check.
During enforcement, described quality scale coding is arranged according to the size of the similarity of each described character set data and described rank matches criteria.
Beneficial effect of the present invention: the present invention can test to target data crowd's quality through the rank standard of storing in the server, obtains the pairing quality scale coding of target data crowd, and returns to the user; Its check speed is fast, and accuracy is high; The user can handle according to each target data and corresponding quality scale coding thereof accordingly.
Description of drawings
Fig. 1 is the process flow diagram of quality of data rank determination methods of the present invention;
Fig. 2 is the process flow diagram of first embodiment of quality of data rank determination methods of the present invention;
Fig. 3 is the process flow diagram of second embodiment of quality of data rank determination methods of the present invention.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.
Quality of data rank determination methods of the present invention is given target data crowd; Go out other data area of different quality level according to the proper vector Matching Location; Can position renewal to other data of different quality level then, thus for the optimization process of late time data information provide the basis.
See also Fig. 1, the invention provides a kind of quality of data rank determination methods, this method may further comprise the steps:
The target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;
Extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data;
Each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; And
Each described target data and corresponding quality scale coding thereof are returned the user.
During enforcement, described quality scale coding is arranged according to the size of the similarity of each described character set data and described rank matches criteria.
See also shown in Figure 2ly, it is first embodiment that quality of data rank determination methods of the present invention provides when implementing, and it may further comprise the steps:
S101: the target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;
S102: extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect;
S103: specify the rank standard of check in advance according to each described target data crowd's proper vector, to improve the speed and the accuracy of quality of data level checking;
S104: each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit; And
S105: each described target data and corresponding quality scale coding thereof are returned the user.
Preferably, step S103 can carry out before step S101 or S102.
Present embodiment is when implementing; In the process of described character set data being tested according to described proper vector corresponding grade standard; Can split character set data earlier; The fractionation mode can adopt character is split, splits or be that standard splits with the character of described dictionary of information by blank character and character number, but the fractionation mode is not limited thereto, to improve the check speed of described character set data.
Present embodiment is when implementing; In the process that described character set data is tested, one by one each described character set data is mated according to its corresponding grade standard, if coupling fully; Be that similarity is 100%; The quality scale that then shows the pairing target data of this character set data is the highest, is arranged in order, and obtains the pairing target data crowd's of described character set data quality scale coding.
See also Fig. 3, second embodiment that the present invention provides when implementing may further comprise the steps:
S201: the target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;
S202: extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect;
S203: if not according to each described target data crowd's proper vector specified level standard in advance, then be defaulted as default setting, the maximum rank standard of the automatic range of choice of described server is tested;
S204: each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit; And
S205: each described target data and corresponding quality scale coding thereof are returned the user.
Preferably, step S203 can carry out before step S201 or S202.
Present embodiment is when implementing; In the process that described character set data is tested, one by one each described character set data is mated according to its corresponding grade standard, if coupling fully; Be that similarity is 100%; The quality scale that then shows the pairing target data of this character set data is the highest, is arranged in order, and obtains the pairing target data crowd's of described character set data quality scale coding.
Present embodiment is when implementing; In the process of described character set data being tested according to described proper vector corresponding grade standard; Can split character set data earlier; The fractionation mode can adopt character is split, splits or be that standard splits with the character of described dictionary of information by blank character and character number, but the fractionation mode is not limited thereto, to improve the check speed of described character set data.
In the process that described character set data is tested of the present invention and embodiment; One by one each described character set data is mated according to its corresponding grade standard; If mate fully, promptly similarity is 100%, shows that then the quality scale of the pairing target data of this character set data is the highest; Be arranged in order, obtain the pairing target data crowd's of described character set data quality scale coding.
Result of the present invention returns to the user, described target data crowd is handled according to the quality scale coding to make things convenient for the user, comprises that the target data crowd who keeps prime information or quality level is encoded carries out data correction etc.
The present invention and embodiment can be with generating algorithm configuration files such as proper vector, rank numbering, rank standard, algorithm statements, so that master routine reads this algorithm configuration file to realize control and the processing to data, generator program control data when implementing; Wherein, proper vector is meant target data crowd's attribute; The rank numbering is meant target data crowd, the sign of the good bad degree of the quality of data; The rank standard is meant the structuring character set of judging target data crowd quality scale, the corresponding different rank standard of different characteristic vector;
Algorithm statement comprises: at first match target data crowd corresponding grade standard according to proper vector, secondly, with the rank standard target data crowd is carried out quality inspection, at last, the target data of returning different stage is trooped and is closed.
Wherein, program control data is meant the data of handling through program construction, comprising: proper vector, rank standard, algorithm handle.
For example: if the target data crowd is present in the tables of data, attribute can be regarded as field title, i.e. proper vector; Return set and be equivalent to the good and bad extent and scope of the corresponding result set of catching respectively;
Attribute 1| attribute 2| attribute 3|
Rank 1| rank 2| rank 3|
Second step: read the algorithm configuration file and form program control data;
What should explain at last is: above embodiment is only in order to explaining technical scheme of the present invention, but not limits it; Those of ordinary skill in the art can make amendment to the technical scheme of invention, perhaps part technical characterictic wherein is equal to replacement; And these are revised or replacement, do not make the spirit and the scope of the essence disengaging embodiment of the invention technical scheme of relevant art scheme.Like this, belong within the scope of claim of the present invention and equivalent technologies thereof if of the present invention these are revised with modification, then the present invention also comprises these changes and modification interior.
Claims (4)
1. a quality of data rank determination methods is characterized in that, may further comprise the steps:
The target data crowd who obtains is sent to server;
Extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data;
Each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; And
Each described target data and corresponding quality scale coding thereof are returned the user.
2. quality of data rank determination methods according to claim 1 is characterized in that, and is further comprising the steps of: the rank standard of specifying check according to each described target data crowd's proper vector in advance.
3. quality of data rank determination methods according to claim 1; It is characterized in that; Further comprising the steps of: if not according to each described target data crowd's proper vector specified level standard in advance; Then be defaulted as default setting, the maximum rank standard of the automatic range of choice of described server is tested.
4. according to claim 2 or 3 described quality of data rank determination methods, it is characterized in that described quality scale coding is arranged according to the size of the similarity of each described character set data and described rank matches criteria.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110023938.1A CN102609418B (en) | 2011-01-21 | 2011-01-21 | Data quality grade judging method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110023938.1A CN102609418B (en) | 2011-01-21 | 2011-01-21 | Data quality grade judging method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102609418A true CN102609418A (en) | 2012-07-25 |
CN102609418B CN102609418B (en) | 2015-02-04 |
Family
ID=46526800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110023938.1A Active CN102609418B (en) | 2011-01-21 | 2011-01-21 | Data quality grade judging method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102609418B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1698111A (en) * | 2001-12-05 | 2005-11-16 | 皇家飞利浦电子股份有限公司 | Method and apparatus for verifying the integrity of system data |
CN101286156A (en) * | 2007-05-29 | 2008-10-15 | 北大方正集团有限公司 | Method for removing repeated object based on metadata |
CN101576893A (en) * | 2008-05-09 | 2009-11-11 | 北京世纪拓远软件科技发展有限公司 | Method and system for analyzing data quality |
-
2011
- 2011-01-21 CN CN201110023938.1A patent/CN102609418B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1698111A (en) * | 2001-12-05 | 2005-11-16 | 皇家飞利浦电子股份有限公司 | Method and apparatus for verifying the integrity of system data |
CN101286156A (en) * | 2007-05-29 | 2008-10-15 | 北大方正集团有限公司 | Method for removing repeated object based on metadata |
CN101576893A (en) * | 2008-05-09 | 2009-11-11 | 北京世纪拓远软件科技发展有限公司 | Method and system for analyzing data quality |
Also Published As
Publication number | Publication date |
---|---|
CN102609418B (en) | 2015-02-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112199375B (en) | Cross-modal data processing method and device, storage medium and electronic device | |
US10460029B2 (en) | Reply information recommendation method and apparatus | |
US20200234002A1 (en) | Optimization techniques for artificial intelligence | |
US11915104B2 (en) | Normalizing text attributes for machine learning models | |
CN104933152A (en) | Named entity recognition method and device | |
CN106202380B (en) | Method and system for constructing classified corpus and server with system | |
CN103593412B (en) | A kind of answer method and system based on tree structure problem | |
CN111026886A (en) | Multi-round dialogue processing method for professional scene | |
CN107004141A (en) | To the efficient mark of large sample group | |
CN113536081B (en) | Data center data management method and system based on artificial intelligence | |
CN110874536B (en) | Corpus quality evaluation model generation method and double-sentence pair inter-translation quality evaluation method | |
Zarisheva et al. | Dialog act annotation for twitter conversations | |
CN116049345B (en) | Document-level event joint extraction method and system based on bidirectional event complete graph | |
CN112906393A (en) | Meta learning-based few-sample entity identification method | |
CN114282513A (en) | Text semantic similarity matching method and system, intelligent terminal and storage medium | |
CN109063772A (en) | A kind of image individuation semantic analysis, device and equipment based on deep learning | |
CN111597336A (en) | Processing method and device of training text, electronic equipment and readable storage medium | |
CN102609418A (en) | Data quality grade judging method | |
CN115438645A (en) | Text data enhancement method and system for sequence labeling task | |
CN111401069A (en) | Intention recognition method and intention recognition device for conversation text and terminal | |
CN115794997A (en) | Enterprise matching degree processing method and device based on enterprise labels | |
CN114547391A (en) | Message auditing method and device | |
CN107608955B (en) | Inter-translation method and device for named entities in Hanzang | |
CN116992874B (en) | Text quotation auditing and tracing method, system, device and storage medium | |
CN116149258B (en) | Numerical control machine tool code generation method based on multi-mode information and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder |
Address after: 100085 2 floor 1, four street, Haidian District, Beijing. Patentee after: BeijingDuxiu Technology Co., Ltd. Address before: 100085 C-710, Jiahua building, nine, Shang di San Jie, Haidian District, Beijing. Patentee before: BeijingDuxiu Technology Co., Ltd. |
|
CP02 | Change in the address of a patent holder |