CN111581195A - Method, system and device for quality inspection marking data - Google Patents

Method, system and device for quality inspection marking data Download PDF

Info

Publication number
CN111581195A
CN111581195A CN202010353465.0A CN202010353465A CN111581195A CN 111581195 A CN111581195 A CN 111581195A CN 202010353465 A CN202010353465 A CN 202010353465A CN 111581195 A CN111581195 A CN 111581195A
Authority
CN
China
Prior art keywords
data
quality inspection
labeling
labeled
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010353465.0A
Other languages
Chinese (zh)
Inventor
陈鑫
肖龙源
廖斌
李稀敏
刘晓葳
谭玉坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010353465.0A priority Critical patent/CN111581195A/en
Publication of CN111581195A publication Critical patent/CN111581195A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses a method for quality inspection of labeled data, which realizes the quality inspection of the labeled data through the following steps: step 101, marking standard data according to a standard rule, outputting a result, marking original data as quality inspection data, and marking a marking result as a quality inspection data standard; 102, inserting quality inspection data into data to be labeled, wherein the quality inspection data are standard data with standard labeling results and are provided with labels; 103, marking data including quality inspection data and data to be marked to obtain a marking result; and 104, extracting the labeling result of the quality inspection data with the label, and comparing the labeling result with the quality inspection data standard to obtain a comparison result, namely the quality inspection result of the labeling data. The invention also discloses a system and a device adopting the method. The invention has the advantages that: the misjudgment caused by subjective reasons of manual quality inspection during quality inspection is greatly avoided, the accuracy of the labeled data is ensured, the efficiency of quality inspection is improved, and the time consumption is reduced.

Description

Method, system and device for quality inspection marking data
Technical Field
The invention relates to the field of data labeling, in particular to a method, a system and a device for labeling data in quality inspection.
Background
The quality inspection of the labeled data in the prior art is based on sampling quality inspection; there are roughly two sampling methods, one is random sampling, and the other is hierarchical sampling (for the classification problem, hierarchical sampling refers to data that each label finally draws to n × p/s, n is the total number of labels to be drawn, s is the total number of labels (total number of samples), and p is the percentage to be drawn). However, both of these methods have a problem of low accuracy and long time consumption. For accuracy, when only one quality testing person exists, compared with a plurality of quality testing persons, subjectivity exists, and the standards of the labeling persons may be different, so that the quality testing results may also be different; when the data is inspected in large quantities, the time is relatively more.
Disclosure of Invention
The technical problem to be solved by the present invention is how to improve the accuracy and efficiency of quality inspection of labeled data, and a method, a system and a device for quality inspection of labeled data are provided.
In order to achieve the purpose, the invention provides the following technical scheme: a method for quality inspection of labeled data realizes quality inspection of labeled data by the following steps:
step 101, marking standard data according to a standard rule, outputting a result, marking original data as quality inspection data, and marking a marking result as a quality inspection data standard;
102, inserting quality inspection data into data to be labeled, wherein the quality inspection data are standard data with standard labeling results and are provided with labels;
103, marking data including quality inspection data and data to be marked to obtain a marking result;
and 104, extracting the labeling result of the quality inspection data with the label, and comparing the labeling result with the quality inspection data standard to obtain a comparison result, namely the quality inspection result of the labeling data.
Further, the quality inspection data in step 101 is specifically labeled as follows:
a01, labeling a predetermined number of data of each category;
a02, checking each labeled category data, and checking and modifying the label definition one by one as a standard rule;
a03, selecting the data with the same category as the data to be labeled as standard data.
Further, the quantity of the quality inspection data is 1% -50% of the quantity of the data to be labeled.
Further, the quantity of the quality inspection data is 10% of the quantity of the data to be labeled.
Further, 80% of the quality inspection data is original data, 10% of the quality inspection data is data with stop words modified/added/deleted, and 10% of the quality inspection data is data with common wrongly-written characters added into sentences.
Further, the quality inspection result in the step 104 includes an accuracy and a labeling bias, the accuracy is a percentage of a data amount of the quality inspection data with a consistent result after the labeling result of the quality inspection data with the label is compared with the quality inspection data standard, and the labeling bias is a classification type with a higher error ratio.
Further, the process of extracting the labeling result of the quality inspection data with the label in step 104 and comparing the labeling result with the quality inspection data standard is implemented by a computer.
Another object of the present invention is to provide a system for quality inspection labeled data, which comprises a labeled data input module for inputting data to be labeled, a quality inspection data insertion module for inserting quality inspection data according to the quantity and classification of the input data to be labeled, a labeled data output module for inputting labeled data generated by the method of claim 1, a labeled result input module for outputting data to be labeled into which the quality inspection data is inserted, a labeled result comparison module for comparing labeled data in the input labeled data with standard labeled results of the quality inspection data, and the quality inspection report output module is used for outputting a quality inspection result report generated by comparison of the labeling result comparison module.
The invention further aims to provide a device for quality inspection labeling data, which comprises a memory, a processor and a transmission interface, wherein the memory is used for storing quality inspection data and temporarily storing input data to be labeled and input labeled data, the memory is also used for storing a program for comparing a labeling result of the quality inspection data with a label with a quality inspection data standard and a program for generating a quality inspection report, the processor is used for realizing the method for quality inspection labeling data according to the information stored in the memory, and the transmission interface is used for accessing and outputting the data.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the quality inspection data is inserted into the marked data, only the inserted data is automatically inspected during quality inspection, and the inspection can be realized through a computer, so that manual comparison is liberated, the accuracy is improved, the error rate is greatly reduced, and the risk of tearing the skin at the accuracy is avoided. For the marking personnel, the result of the quality inspection can reflect that the marking personnel easily make mistakes on which label, the quality inspection personnel can conveniently arrange the quality inspection report to inform the marking personnel, the improvement is made for the wrong label, the quality of the marking personnel is improved, the misjudgment of the quality inspection personnel caused by subjective reasons in the data quality inspection process is greatly avoided, the accuracy of the marking data can be ensured, and the quality inspection result plays a vital role in a subsequent training machine learning model.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
fig. 2 is a functional block diagram of a system in embodiment 2 of the present invention;
fig. 3 is a functional block diagram of an apparatus in embodiment 3 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, the present embodiment discloses a method for quality inspection of labeled data, which implements quality inspection of labeled data through the following steps:
step 101, marking standard data according to a standard rule, outputting a result, marking original data as quality inspection data, marking a marking result as a quality inspection data standard, wherein the marking result is feasible and can be used as a label to be recorded in the quality inspection data together; the quality inspection data is specifically marked as follows:
a01, labeling a predetermined number of data of each category;
a02, checking each labeled category data, and checking and modifying the label definition one by one as a standard rule;
a03, selecting the data with the same category as the data to be labeled as standard data.
The quantity of the quality inspection data is 1% -50% of the quantity of the data to be labeled, the specific proportion can be selected according to actual needs, and in principle, the larger the inserted data quantity is, the closer the quality inspection result is to the fact. In this embodiment, the amount of the quality inspection data is preferably 10% of the amount of the data to be labeled, and in this ratio, it is possible to ensure that the quality inspection result is close to the fact, and avoid excessive waste of human resources. At this stage, a certain amount of quality inspection data are marked, the data can be derived from the definition that labeling personnel need to test the labeling data to unify the labeling specification of labeling personnel and modify labels when the labeling rules are firstly made, the data of some labels are omitted less, the data of some labels are subdivided more, the problem of the labeling specification of the edition is found, and the edition of the labeling rules can be optimized through continuous labeling. The number of the test mark data is 500 pieces per category, and the specific number can be according to the total amount of the marking data required and the specific task.
The quantity of the inserted quality inspection data is 10% of the actual required labeled data, namely 10% of the quality inspection data and 90% of the actual labeled data. Feasible, 8% of the 10% data are from quality control data; the rest 2% of the data are from quality inspection data, but 1% of the data are stop words (prepositions such as deleting or adding 'o', 'o' and the like or words in a language) which are modified/added/deleted, and the meaning of sentences is not influenced; and 1% of the sentences are added with common wrongly-written characters, such as 'how to treat prostate', and the sentences are modified into 'how to treat the column of money', so that the meaning of the sentences is not influenced.
Step 102, inserting quality inspection data into data to be labeled, wherein the quality inspection data is standard data with a standard labeling result, the quality inspection data is provided with a label, and the label of the quality inspection data can be added and identified through a corresponding label code.
103, marking data including quality inspection data and data to be marked to obtain a marking result; in the process of marking the money and distributing the corresponding data to a marking person for marking, the marking person can be a natural person, or can be a neural network algorithm for marking by adopting a marking algorithm, namely, the marking is carried out by a computer.
And 104, extracting the labeling result of the quality inspection data with the label, and comparing the labeling result with the quality inspection data standard to obtain a comparison result, namely the quality inspection result of the labeling data. It is feasible that the comparison process is to compare whether or not the two are identical or identical in meaning, for example, the quality control data "how much money is routinely checked for blood? The "correct label should be" consulting and checking price ", and the label labeled by the labeling person is" consulting and checking price ", which is an incorrect labeling result. Therefore, in the quality inspection, only the labels of the same sentence are compared to judge whether the labels are the same, and the labels are the same to indicate that the label is correct. Preferably, in this embodiment, the comparison may be performed by a computer, and when the comparison is performed by the computer, it is only required to compare whether the tag marked by the marker is the same as the tag of the quality inspection data.
In addition, the quality inspection result includes an accuracy and a labeling bias, the accuracy is a percentage of a data amount of the quality inspection data which is consistent with a result obtained by comparing the labeling result of the quality inspection data with the label with the quality inspection data standard, and the labeling bias is a classification type with a high error ratio, for example, if the correct label in the quality inspection result accounts for 90% of the total number, the accuracy is 90%. In addition, the labeling bias can be not only a category bias, but also other biases such as a high-frequency wrongly written word bias or a deeper bias to wrong semantics, and the quality inspection report can be uniformly analyzed according to the comparison result of the computer, and the analysis can be a two-dimensional analysis and is easy to realize.
According to the invention, the quality inspection data is inserted into the marked data, only the inserted data is automatically inspected during quality inspection, and the inspection can be realized through a computer, so that manual comparison is liberated, the accuracy is improved, the error rate is greatly reduced, and the risk of tearing the skin at the accuracy is avoided. For the marking personnel, the result of the quality inspection can reflect that the marking personnel easily make mistakes on which label, the quality inspection personnel can conveniently arrange the quality inspection report to inform the marking personnel, the improvement is made for the wrong label, the quality of the marking personnel is improved, the misjudgment of the quality inspection personnel caused by subjective reasons in the data quality inspection process is greatly avoided, the accuracy of the marking data can be ensured, and the quality inspection result plays a vital role in a subsequent training machine learning model.
Example 2
Referring to the functional structure diagram of the system shown in fig. 2, the present embodiment discloses a system for quality inspection labeled data, which includes a labeled data input module, a quality inspection data insertion module, a labeled data output module, a labeled result input module, a labeled result comparison module and a quality inspection report output module, wherein the labeled data input module is configured to input data to be labeled, the quality inspection data insertion module is configured to insert quality inspection data according to the quantity and classification of the input data to be labeled, the quality inspection data is the quality inspection data generated by the method according to claim 1, the quality inspection data includes corresponding standard labeling results, the labeled data output module is configured to output the data to be labeled with the inserted quality inspection data, the labeled result input module is configured to input labeled data with labels in the input labeled data, and the labeled result comparison module is configured to combine the data with the standard labeling of the quality inspection data And if the result is compared, the quality inspection report output module is used for outputting a quality inspection result report generated by comparison of the labeling result comparison module.
The data source inserted by the quality inspection data insertion module may be a storage device in the system, or may be a storage device from outside the system, such as a cloud storage terminal or a network, or may be stored in a cache after external access, and deleted by itself after a quality inspection report is formed.
Example 3
Referring to fig. 2, a functional structure diagram of an apparatus for quality inspection marking data is disclosed in the present embodiment, which includes a memory, a processor and a transmission interface, wherein the memory is used for storing quality inspection data and temporarily storing input data to be marked and input marked data, and the memory further stores a program for comparing a marking result of the quality inspection data with a label with a quality inspection data standard and a program for generating a quality inspection report. The storage may be a storage device such as a solid state disk, a mechanical hard disk, a memory, or an external storage device such as a cloud storage on a network. The processor is used for implementing the method for quality control marking data according to the information stored in the memory, and may be a processor such as a computer, or a processor carried by other mobile devices, that is, the apparatus may be a mobile terminal device or a calculator. The transmission interface is used for accessing and outputting data, and can be a device of a physical data interface such as a USB and the like, and can also be a virtual interface such as a network interface and the like.
In addition, the apparatus should also include a power supply, a display, an input device, etc., which are all conventional components of a computer or a mobile terminal apparatus, and those skilled in the art should know how to use them, and will not be described in detail in the detailed embodiments of the present invention.
In addition, the data to be marked after the quality inspection data are inserted into the device can be marked without being output to the outside, the data can be marked by a marking algorithm stored in the device or a mode of marking the data for natural people after being displayed by a display device, and the quality inspection is directly carried out after the marking is finished. That is, both the labeling and the quality inspection can be performed by different programs installed on the same computer.
Automatic quality inspection is completed through a machine, and misjudgment caused by subjective reasons of manual quality inspection during quality inspection can be greatly avoided.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims (9)

1. A method for quality inspection of labeled data is characterized in that the quality inspection of the labeled data is realized through the following steps:
step 101, marking standard data according to a standard rule, outputting a result, marking original data as quality inspection data, and marking a marking result as a quality inspection data standard;
102, inserting quality inspection data into data to be labeled, wherein the quality inspection data are standard data with standard labeling results and are provided with labels;
103, marking data including quality inspection data and data to be marked to obtain a marking result;
and 104, extracting the labeling result of the quality inspection data with the label, and comparing the labeling result with the quality inspection data standard to obtain a comparison result, namely the quality inspection result of the labeling data.
2. The method according to claim 1, wherein the quality inspection data in step 101 is specifically labeled as follows:
a01, labeling a predetermined number of data of each category;
a02, checking each labeled category data, and checking and modifying the label definition one by one as a standard rule;
a03, selecting the data with the same category as the data to be labeled as standard data.
3. The method of claim 1, wherein the quantity of the quality inspection data is 1% -50% of the quantity of the data to be labeled.
4. The method of claim 3, wherein the quality inspection data is 10% of the data to be labeled.
5. The method for labeling data in quality control according to any one of claims 1 to 4, wherein 80% of the quality control data is original data, 10% of the quality control data is data with stop words modified/added/deleted, and 10% of the quality control data is data with common wrongly written words added to sentences.
6. The method of claim 1, wherein the quality inspection result in step 104 comprises an accuracy rate and a labeling bias, the accuracy rate is a percentage of a data amount of the labeled quality inspection data, which is consistent with a result obtained by comparing the labeled quality inspection data with a quality inspection data standard, to the quality inspection data amount, and the labeling bias is a classification category with a higher error ratio.
7. The method of claim 1, wherein the step 104 of extracting the labeled quality inspection data from the labeled quality inspection data is implemented by a computer.
8. A quality inspection labeling data system is characterized by comprising a labeling data input module, a quality inspection data insertion module, a labeling data output module, a labeling result input module, a labeling result comparison module and a quality inspection report output module, wherein the labeling data input module is used for inputting data to be labeled, the quality inspection data insertion module is used for inserting quality inspection data according to the quantity and classification of the input data to be labeled, the quality inspection data is generated by the method of claim 1, the quality inspection data comprises corresponding standard labeling results, the labeling data output module is used for outputting the data to be labeled with the inserted quality inspection data, the labeling result input module is used for inputting labeling data with labeling completion, the labeling result comparison module is used for comparing the data with labels in the input labeling data with the standard labeling results of the quality inspection data, and the quality inspection report output module is used for outputting a quality inspection result report generated by comparison of the labeling result comparison module.
9. A device for quality inspection labeling data, which is characterized by comprising a memory, a processor and a transmission interface, wherein the memory is used for storing quality inspection data, temporarily storing input data to be labeled and input labeled data, the memory is also stored with a program for comparing a labeling result of the quality inspection data with a label with a quality inspection data standard and a program for generating a quality inspection report, the processor is used for realizing the method in claim 1 according to the information stored in the memory, and the transmission interface is used for accessing and outputting the data.
CN202010353465.0A 2020-04-29 2020-04-29 Method, system and device for quality inspection marking data Pending CN111581195A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010353465.0A CN111581195A (en) 2020-04-29 2020-04-29 Method, system and device for quality inspection marking data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010353465.0A CN111581195A (en) 2020-04-29 2020-04-29 Method, system and device for quality inspection marking data

Publications (1)

Publication Number Publication Date
CN111581195A true CN111581195A (en) 2020-08-25

Family

ID=72122581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010353465.0A Pending CN111581195A (en) 2020-04-29 2020-04-29 Method, system and device for quality inspection marking data

Country Status (1)

Country Link
CN (1) CN111581195A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170351681A1 (en) * 2016-06-03 2017-12-07 International Business Machines Corporation Label propagation in graphs
CN109086814A (en) * 2018-07-23 2018-12-25 腾讯科技(深圳)有限公司 A kind of data processing method, device and the network equipment
CN109684947A (en) * 2018-12-11 2019-04-26 广州景骐科技有限公司 Mark quality control method, device, computer equipment and storage medium
CN109815487A (en) * 2018-12-25 2019-05-28 平安科技(深圳)有限公司 Text quality detecting method, electronic device, computer equipment and storage medium
CN110457494A (en) * 2019-08-01 2019-11-15 新华智云科技有限公司 Data mask method, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170351681A1 (en) * 2016-06-03 2017-12-07 International Business Machines Corporation Label propagation in graphs
CN109086814A (en) * 2018-07-23 2018-12-25 腾讯科技(深圳)有限公司 A kind of data processing method, device and the network equipment
CN109684947A (en) * 2018-12-11 2019-04-26 广州景骐科技有限公司 Mark quality control method, device, computer equipment and storage medium
CN109815487A (en) * 2018-12-25 2019-05-28 平安科技(深圳)有限公司 Text quality detecting method, electronic device, computer equipment and storage medium
CN110457494A (en) * 2019-08-01 2019-11-15 新华智云科技有限公司 Data mask method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Steidl et al. Quality analysis of source code comments
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
CN109492164A (en) A kind of recommended method of resume, device, electronic equipment and storage medium
CN108090043B (en) Error correction report processing method and device based on artificial intelligence and readable medium
CN107908641B (en) Method and system for acquiring image annotation data
US20140046947A1 (en) Content revision using question and answer generation
CN107491536B (en) Test question checking method, test question checking device and electronic equipment
CN107153694B (en) Method, device, equipment and storage medium for automatically modifying question errors
CN111444718A (en) Insurance product demand document processing method and device and electronic equipment
WO2021174829A1 (en) Crowdsourced task inspection method, apparatus, computer device, and storage medium
CN114461777A (en) Intelligent question and answer method, device, equipment and storage medium
CN114840684A (en) Map construction method, device and equipment based on medical entity and storage medium
CN109582906A (en) Determination method, apparatus, equipment and the storage medium of data reliability
CN111143372B (en) Data processing method and device
Parra Escartín et al. Questing for quality estimation a user study
CN112395401A (en) Adaptive negative sample pair sampling method and device, electronic equipment and storage medium
CN111581195A (en) Method, system and device for quality inspection marking data
CN109189372B (en) Development script generation method of insurance product and terminal equipment
Konig et al. A semi-automatic verification tool for software requirements specification documents
CN116385189A (en) Method and system for checking matching degree of account listed subjects of financial account-reporting document
CN113050933B (en) Brain graph data processing method, device, equipment and storage medium
CN111461154A (en) Method and device for labeling data
CN113642337B (en) Data processing method and device, translation method, electronic device, and computer-readable storage medium
CN114780688A (en) Text quality inspection method, device and equipment based on rule matching and storage medium
US11087097B2 (en) Automatic item generation for passage-based assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200825

RJ01 Rejection of invention patent application after publication