CN111581195A - Method, system and device for quality inspection marking data - Google Patents
Method, system and device for quality inspection marking data Download PDFInfo
- Publication number
- CN111581195A CN111581195A CN202010353465.0A CN202010353465A CN111581195A CN 111581195 A CN111581195 A CN 111581195A CN 202010353465 A CN202010353465 A CN 202010353465A CN 111581195 A CN111581195 A CN 111581195A
- Authority
- CN
- China
- Prior art keywords
- data
- quality inspection
- labeling
- labeled
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007689 inspection Methods 0.000 title claims abstract description 164
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000002372 labelling Methods 0.000 claims abstract description 69
- 238000003908 quality control method Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000003780 insertion Methods 0.000 claims description 6
- 230000037431 insertion Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 238000012372 quality testing Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000004141 dimensional analysis Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
The invention discloses a method for quality inspection of labeled data, which realizes the quality inspection of the labeled data through the following steps: step 101, marking standard data according to a standard rule, outputting a result, marking original data as quality inspection data, and marking a marking result as a quality inspection data standard; 102, inserting quality inspection data into data to be labeled, wherein the quality inspection data are standard data with standard labeling results and are provided with labels; 103, marking data including quality inspection data and data to be marked to obtain a marking result; and 104, extracting the labeling result of the quality inspection data with the label, and comparing the labeling result with the quality inspection data standard to obtain a comparison result, namely the quality inspection result of the labeling data. The invention also discloses a system and a device adopting the method. The invention has the advantages that: the misjudgment caused by subjective reasons of manual quality inspection during quality inspection is greatly avoided, the accuracy of the labeled data is ensured, the efficiency of quality inspection is improved, and the time consumption is reduced.
Description
Technical Field
The invention relates to the field of data labeling, in particular to a method, a system and a device for labeling data in quality inspection.
Background
The quality inspection of the labeled data in the prior art is based on sampling quality inspection; there are roughly two sampling methods, one is random sampling, and the other is hierarchical sampling (for the classification problem, hierarchical sampling refers to data that each label finally draws to n × p/s, n is the total number of labels to be drawn, s is the total number of labels (total number of samples), and p is the percentage to be drawn). However, both of these methods have a problem of low accuracy and long time consumption. For accuracy, when only one quality testing person exists, compared with a plurality of quality testing persons, subjectivity exists, and the standards of the labeling persons may be different, so that the quality testing results may also be different; when the data is inspected in large quantities, the time is relatively more.
Disclosure of Invention
The technical problem to be solved by the present invention is how to improve the accuracy and efficiency of quality inspection of labeled data, and a method, a system and a device for quality inspection of labeled data are provided.
In order to achieve the purpose, the invention provides the following technical scheme: a method for quality inspection of labeled data realizes quality inspection of labeled data by the following steps:
102, inserting quality inspection data into data to be labeled, wherein the quality inspection data are standard data with standard labeling results and are provided with labels;
103, marking data including quality inspection data and data to be marked to obtain a marking result;
and 104, extracting the labeling result of the quality inspection data with the label, and comparing the labeling result with the quality inspection data standard to obtain a comparison result, namely the quality inspection result of the labeling data.
Further, the quality inspection data in step 101 is specifically labeled as follows:
a01, labeling a predetermined number of data of each category;
a02, checking each labeled category data, and checking and modifying the label definition one by one as a standard rule;
a03, selecting the data with the same category as the data to be labeled as standard data.
Further, the quantity of the quality inspection data is 1% -50% of the quantity of the data to be labeled.
Further, the quantity of the quality inspection data is 10% of the quantity of the data to be labeled.
Further, 80% of the quality inspection data is original data, 10% of the quality inspection data is data with stop words modified/added/deleted, and 10% of the quality inspection data is data with common wrongly-written characters added into sentences.
Further, the quality inspection result in the step 104 includes an accuracy and a labeling bias, the accuracy is a percentage of a data amount of the quality inspection data with a consistent result after the labeling result of the quality inspection data with the label is compared with the quality inspection data standard, and the labeling bias is a classification type with a higher error ratio.
Further, the process of extracting the labeling result of the quality inspection data with the label in step 104 and comparing the labeling result with the quality inspection data standard is implemented by a computer.
Another object of the present invention is to provide a system for quality inspection labeled data, which comprises a labeled data input module for inputting data to be labeled, a quality inspection data insertion module for inserting quality inspection data according to the quantity and classification of the input data to be labeled, a labeled data output module for inputting labeled data generated by the method of claim 1, a labeled result input module for outputting data to be labeled into which the quality inspection data is inserted, a labeled result comparison module for comparing labeled data in the input labeled data with standard labeled results of the quality inspection data, and the quality inspection report output module is used for outputting a quality inspection result report generated by comparison of the labeling result comparison module.
The invention further aims to provide a device for quality inspection labeling data, which comprises a memory, a processor and a transmission interface, wherein the memory is used for storing quality inspection data and temporarily storing input data to be labeled and input labeled data, the memory is also used for storing a program for comparing a labeling result of the quality inspection data with a label with a quality inspection data standard and a program for generating a quality inspection report, the processor is used for realizing the method for quality inspection labeling data according to the information stored in the memory, and the transmission interface is used for accessing and outputting the data.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the quality inspection data is inserted into the marked data, only the inserted data is automatically inspected during quality inspection, and the inspection can be realized through a computer, so that manual comparison is liberated, the accuracy is improved, the error rate is greatly reduced, and the risk of tearing the skin at the accuracy is avoided. For the marking personnel, the result of the quality inspection can reflect that the marking personnel easily make mistakes on which label, the quality inspection personnel can conveniently arrange the quality inspection report to inform the marking personnel, the improvement is made for the wrong label, the quality of the marking personnel is improved, the misjudgment of the quality inspection personnel caused by subjective reasons in the data quality inspection process is greatly avoided, the accuracy of the marking data can be ensured, and the quality inspection result plays a vital role in a subsequent training machine learning model.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
fig. 2 is a functional block diagram of a system in embodiment 2 of the present invention;
fig. 3 is a functional block diagram of an apparatus in embodiment 3 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, the present embodiment discloses a method for quality inspection of labeled data, which implements quality inspection of labeled data through the following steps:
a01, labeling a predetermined number of data of each category;
a02, checking each labeled category data, and checking and modifying the label definition one by one as a standard rule;
a03, selecting the data with the same category as the data to be labeled as standard data.
The quantity of the quality inspection data is 1% -50% of the quantity of the data to be labeled, the specific proportion can be selected according to actual needs, and in principle, the larger the inserted data quantity is, the closer the quality inspection result is to the fact. In this embodiment, the amount of the quality inspection data is preferably 10% of the amount of the data to be labeled, and in this ratio, it is possible to ensure that the quality inspection result is close to the fact, and avoid excessive waste of human resources. At this stage, a certain amount of quality inspection data are marked, the data can be derived from the definition that labeling personnel need to test the labeling data to unify the labeling specification of labeling personnel and modify labels when the labeling rules are firstly made, the data of some labels are omitted less, the data of some labels are subdivided more, the problem of the labeling specification of the edition is found, and the edition of the labeling rules can be optimized through continuous labeling. The number of the test mark data is 500 pieces per category, and the specific number can be according to the total amount of the marking data required and the specific task.
The quantity of the inserted quality inspection data is 10% of the actual required labeled data, namely 10% of the quality inspection data and 90% of the actual labeled data. Feasible, 8% of the 10% data are from quality control data; the rest 2% of the data are from quality inspection data, but 1% of the data are stop words (prepositions such as deleting or adding 'o', 'o' and the like or words in a language) which are modified/added/deleted, and the meaning of sentences is not influenced; and 1% of the sentences are added with common wrongly-written characters, such as 'how to treat prostate', and the sentences are modified into 'how to treat the column of money', so that the meaning of the sentences is not influenced.
103, marking data including quality inspection data and data to be marked to obtain a marking result; in the process of marking the money and distributing the corresponding data to a marking person for marking, the marking person can be a natural person, or can be a neural network algorithm for marking by adopting a marking algorithm, namely, the marking is carried out by a computer.
And 104, extracting the labeling result of the quality inspection data with the label, and comparing the labeling result with the quality inspection data standard to obtain a comparison result, namely the quality inspection result of the labeling data. It is feasible that the comparison process is to compare whether or not the two are identical or identical in meaning, for example, the quality control data "how much money is routinely checked for blood? The "correct label should be" consulting and checking price ", and the label labeled by the labeling person is" consulting and checking price ", which is an incorrect labeling result. Therefore, in the quality inspection, only the labels of the same sentence are compared to judge whether the labels are the same, and the labels are the same to indicate that the label is correct. Preferably, in this embodiment, the comparison may be performed by a computer, and when the comparison is performed by the computer, it is only required to compare whether the tag marked by the marker is the same as the tag of the quality inspection data.
In addition, the quality inspection result includes an accuracy and a labeling bias, the accuracy is a percentage of a data amount of the quality inspection data which is consistent with a result obtained by comparing the labeling result of the quality inspection data with the label with the quality inspection data standard, and the labeling bias is a classification type with a high error ratio, for example, if the correct label in the quality inspection result accounts for 90% of the total number, the accuracy is 90%. In addition, the labeling bias can be not only a category bias, but also other biases such as a high-frequency wrongly written word bias or a deeper bias to wrong semantics, and the quality inspection report can be uniformly analyzed according to the comparison result of the computer, and the analysis can be a two-dimensional analysis and is easy to realize.
According to the invention, the quality inspection data is inserted into the marked data, only the inserted data is automatically inspected during quality inspection, and the inspection can be realized through a computer, so that manual comparison is liberated, the accuracy is improved, the error rate is greatly reduced, and the risk of tearing the skin at the accuracy is avoided. For the marking personnel, the result of the quality inspection can reflect that the marking personnel easily make mistakes on which label, the quality inspection personnel can conveniently arrange the quality inspection report to inform the marking personnel, the improvement is made for the wrong label, the quality of the marking personnel is improved, the misjudgment of the quality inspection personnel caused by subjective reasons in the data quality inspection process is greatly avoided, the accuracy of the marking data can be ensured, and the quality inspection result plays a vital role in a subsequent training machine learning model.
Example 2
Referring to the functional structure diagram of the system shown in fig. 2, the present embodiment discloses a system for quality inspection labeled data, which includes a labeled data input module, a quality inspection data insertion module, a labeled data output module, a labeled result input module, a labeled result comparison module and a quality inspection report output module, wherein the labeled data input module is configured to input data to be labeled, the quality inspection data insertion module is configured to insert quality inspection data according to the quantity and classification of the input data to be labeled, the quality inspection data is the quality inspection data generated by the method according to claim 1, the quality inspection data includes corresponding standard labeling results, the labeled data output module is configured to output the data to be labeled with the inserted quality inspection data, the labeled result input module is configured to input labeled data with labels in the input labeled data, and the labeled result comparison module is configured to combine the data with the standard labeling of the quality inspection data And if the result is compared, the quality inspection report output module is used for outputting a quality inspection result report generated by comparison of the labeling result comparison module.
The data source inserted by the quality inspection data insertion module may be a storage device in the system, or may be a storage device from outside the system, such as a cloud storage terminal or a network, or may be stored in a cache after external access, and deleted by itself after a quality inspection report is formed.
Example 3
Referring to fig. 2, a functional structure diagram of an apparatus for quality inspection marking data is disclosed in the present embodiment, which includes a memory, a processor and a transmission interface, wherein the memory is used for storing quality inspection data and temporarily storing input data to be marked and input marked data, and the memory further stores a program for comparing a marking result of the quality inspection data with a label with a quality inspection data standard and a program for generating a quality inspection report. The storage may be a storage device such as a solid state disk, a mechanical hard disk, a memory, or an external storage device such as a cloud storage on a network. The processor is used for implementing the method for quality control marking data according to the information stored in the memory, and may be a processor such as a computer, or a processor carried by other mobile devices, that is, the apparatus may be a mobile terminal device or a calculator. The transmission interface is used for accessing and outputting data, and can be a device of a physical data interface such as a USB and the like, and can also be a virtual interface such as a network interface and the like.
In addition, the apparatus should also include a power supply, a display, an input device, etc., which are all conventional components of a computer or a mobile terminal apparatus, and those skilled in the art should know how to use them, and will not be described in detail in the detailed embodiments of the present invention.
In addition, the data to be marked after the quality inspection data are inserted into the device can be marked without being output to the outside, the data can be marked by a marking algorithm stored in the device or a mode of marking the data for natural people after being displayed by a display device, and the quality inspection is directly carried out after the marking is finished. That is, both the labeling and the quality inspection can be performed by different programs installed on the same computer.
Automatic quality inspection is completed through a machine, and misjudgment caused by subjective reasons of manual quality inspection during quality inspection can be greatly avoided.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.
Claims (9)
1. A method for quality inspection of labeled data is characterized in that the quality inspection of the labeled data is realized through the following steps:
step 101, marking standard data according to a standard rule, outputting a result, marking original data as quality inspection data, and marking a marking result as a quality inspection data standard;
102, inserting quality inspection data into data to be labeled, wherein the quality inspection data are standard data with standard labeling results and are provided with labels;
103, marking data including quality inspection data and data to be marked to obtain a marking result;
and 104, extracting the labeling result of the quality inspection data with the label, and comparing the labeling result with the quality inspection data standard to obtain a comparison result, namely the quality inspection result of the labeling data.
2. The method according to claim 1, wherein the quality inspection data in step 101 is specifically labeled as follows:
a01, labeling a predetermined number of data of each category;
a02, checking each labeled category data, and checking and modifying the label definition one by one as a standard rule;
a03, selecting the data with the same category as the data to be labeled as standard data.
3. The method of claim 1, wherein the quantity of the quality inspection data is 1% -50% of the quantity of the data to be labeled.
4. The method of claim 3, wherein the quality inspection data is 10% of the data to be labeled.
5. The method for labeling data in quality control according to any one of claims 1 to 4, wherein 80% of the quality control data is original data, 10% of the quality control data is data with stop words modified/added/deleted, and 10% of the quality control data is data with common wrongly written words added to sentences.
6. The method of claim 1, wherein the quality inspection result in step 104 comprises an accuracy rate and a labeling bias, the accuracy rate is a percentage of a data amount of the labeled quality inspection data, which is consistent with a result obtained by comparing the labeled quality inspection data with a quality inspection data standard, to the quality inspection data amount, and the labeling bias is a classification category with a higher error ratio.
7. The method of claim 1, wherein the step 104 of extracting the labeled quality inspection data from the labeled quality inspection data is implemented by a computer.
8. A quality inspection labeling data system is characterized by comprising a labeling data input module, a quality inspection data insertion module, a labeling data output module, a labeling result input module, a labeling result comparison module and a quality inspection report output module, wherein the labeling data input module is used for inputting data to be labeled, the quality inspection data insertion module is used for inserting quality inspection data according to the quantity and classification of the input data to be labeled, the quality inspection data is generated by the method of claim 1, the quality inspection data comprises corresponding standard labeling results, the labeling data output module is used for outputting the data to be labeled with the inserted quality inspection data, the labeling result input module is used for inputting labeling data with labeling completion, the labeling result comparison module is used for comparing the data with labels in the input labeling data with the standard labeling results of the quality inspection data, and the quality inspection report output module is used for outputting a quality inspection result report generated by comparison of the labeling result comparison module.
9. A device for quality inspection labeling data, which is characterized by comprising a memory, a processor and a transmission interface, wherein the memory is used for storing quality inspection data, temporarily storing input data to be labeled and input labeled data, the memory is also stored with a program for comparing a labeling result of the quality inspection data with a label with a quality inspection data standard and a program for generating a quality inspection report, the processor is used for realizing the method in claim 1 according to the information stored in the memory, and the transmission interface is used for accessing and outputting the data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010353465.0A CN111581195A (en) | 2020-04-29 | 2020-04-29 | Method, system and device for quality inspection marking data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010353465.0A CN111581195A (en) | 2020-04-29 | 2020-04-29 | Method, system and device for quality inspection marking data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111581195A true CN111581195A (en) | 2020-08-25 |
Family
ID=72122581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010353465.0A Pending CN111581195A (en) | 2020-04-29 | 2020-04-29 | Method, system and device for quality inspection marking data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111581195A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170351681A1 (en) * | 2016-06-03 | 2017-12-07 | International Business Machines Corporation | Label propagation in graphs |
CN109086814A (en) * | 2018-07-23 | 2018-12-25 | 腾讯科技(深圳)有限公司 | A kind of data processing method, device and the network equipment |
CN109684947A (en) * | 2018-12-11 | 2019-04-26 | 广州景骐科技有限公司 | Mark quality control method, device, computer equipment and storage medium |
CN109815487A (en) * | 2018-12-25 | 2019-05-28 | 平安科技(深圳)有限公司 | Text quality detecting method, electronic device, computer equipment and storage medium |
CN110457494A (en) * | 2019-08-01 | 2019-11-15 | 新华智云科技有限公司 | Data mask method, device, electronic equipment and storage medium |
-
2020
- 2020-04-29 CN CN202010353465.0A patent/CN111581195A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170351681A1 (en) * | 2016-06-03 | 2017-12-07 | International Business Machines Corporation | Label propagation in graphs |
CN109086814A (en) * | 2018-07-23 | 2018-12-25 | 腾讯科技(深圳)有限公司 | A kind of data processing method, device and the network equipment |
CN109684947A (en) * | 2018-12-11 | 2019-04-26 | 广州景骐科技有限公司 | Mark quality control method, device, computer equipment and storage medium |
CN109815487A (en) * | 2018-12-25 | 2019-05-28 | 平安科技(深圳)有限公司 | Text quality detecting method, electronic device, computer equipment and storage medium |
CN110457494A (en) * | 2019-08-01 | 2019-11-15 | 新华智云科技有限公司 | Data mask method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Steidl et al. | Quality analysis of source code comments | |
CN109815487B (en) | Text quality inspection method, electronic device, computer equipment and storage medium | |
CN109492164A (en) | A kind of recommended method of resume, device, electronic equipment and storage medium | |
CN108090043B (en) | Error correction report processing method and device based on artificial intelligence and readable medium | |
CN107908641B (en) | Method and system for acquiring image annotation data | |
US20140046947A1 (en) | Content revision using question and answer generation | |
CN107491536B (en) | Test question checking method, test question checking device and electronic equipment | |
CN107153694B (en) | Method, device, equipment and storage medium for automatically modifying question errors | |
CN111444718A (en) | Insurance product demand document processing method and device and electronic equipment | |
WO2021174829A1 (en) | Crowdsourced task inspection method, apparatus, computer device, and storage medium | |
CN114461777A (en) | Intelligent question and answer method, device, equipment and storage medium | |
CN114840684A (en) | Map construction method, device and equipment based on medical entity and storage medium | |
CN109582906A (en) | Determination method, apparatus, equipment and the storage medium of data reliability | |
CN111143372B (en) | Data processing method and device | |
Parra Escartín et al. | Questing for quality estimation a user study | |
CN112395401A (en) | Adaptive negative sample pair sampling method and device, electronic equipment and storage medium | |
CN111581195A (en) | Method, system and device for quality inspection marking data | |
CN109189372B (en) | Development script generation method of insurance product and terminal equipment | |
Konig et al. | A semi-automatic verification tool for software requirements specification documents | |
CN116385189A (en) | Method and system for checking matching degree of account listed subjects of financial account-reporting document | |
CN113050933B (en) | Brain graph data processing method, device, equipment and storage medium | |
CN111461154A (en) | Method and device for labeling data | |
CN113642337B (en) | Data processing method and device, translation method, electronic device, and computer-readable storage medium | |
CN114780688A (en) | Text quality inspection method, device and equipment based on rule matching and storage medium | |
US11087097B2 (en) | Automatic item generation for passage-based assessment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200825 |
|
RJ01 | Rejection of invention patent application after publication |