CN111881657A - Intelligent marking method, terminal equipment and storage medium - Google Patents
Intelligent marking method, terminal equipment and storage medium Download PDFInfo
- Publication number
- CN111881657A CN111881657A CN202010773242.XA CN202010773242A CN111881657A CN 111881657 A CN111881657 A CN 111881657A CN 202010773242 A CN202010773242 A CN 202010773242A CN 111881657 A CN111881657 A CN 111881657A
- Authority
- CN
- China
- Prior art keywords
- data
- file
- sub
- intelligent
- auditing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000002372 labelling Methods 0.000 claims abstract description 34
- 238000003058 natural language processing Methods 0.000 claims abstract description 6
- 238000004590 computer program Methods 0.000 claims description 17
- 238000012550 audit Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/169—Annotation, e.g. comment data or footnotes
Abstract
The invention relates to an intelligent labeling method, a terminal device and a storage medium, wherein the method comprises the following steps: s1: importing a file consisting of data to be marked; s2: pre-labeling all data in the file according to a natural language processing algorithm; s3: dividing all the pre-labeled data into a plurality of sub-data sets, and sending the data contained in each sub-data set to different operation terminals; s4: after receiving the marked data returned by all the operation terminals, sending the data to the operation terminals for auditing; s5: and when all the auditing results are received, generating a label file corresponding to each audited data. According to the invention, through the processes of pre-labeling, manual labeling and auditing the data, not only is intelligent labeling realized, but also the labeling speed and accuracy are improved.
Description
Technical Field
The invention relates to the field of data annotation, in particular to an intelligent annotation method, terminal equipment and a storage medium.
Background
Currently, data annotation is the most time-consuming and labor-consuming link in the initial stage of artificial intelligence, and is also the link that artificial intelligence cannot skip at the present stage. And (3) obtaining training set data of the model by marking the sample, realizing training of the model by the training set data, and finally realizing artificial intelligence. Therefore, the cost of the whole artificial intelligence is greatly reduced by realizing quick labeling and intelligent labeling, and meanwhile, the quick development of the artificial intelligence is promoted.
Disclosure of Invention
In order to solve the above problems, the present invention provides an intelligent labeling method, a terminal device and a storage medium.
The specific scheme is as follows:
an intelligent labeling method comprises the following steps:
s1: importing a file consisting of data to be marked;
s2: pre-labeling all data in the file according to a natural language processing algorithm;
s3: dividing all the pre-labeled data into a plurality of sub-data sets, and sending the data contained in each sub-data set to different operation terminals;
s4: after receiving the marked data returned by all the operation terminals, sending the data to the operation terminals for auditing;
s5: and when all the auditing results are received, generating a label file corresponding to each audited data.
Further, the importing process includes: in the file uploading stage, dividing a file into a plurality of sub-blocks, intercepting one sub-block in the file for transmission each time, calculating the storage position of each sub-block according to the storage position of the file and the offset of the sub-block in the file, and storing each sub-block at a corresponding position according to the calculated storage position;
in the data reading stage, only a part of contents in the file is read each time, the position of the read part of contents in the file is recorded, and the next part of data is read after the data processing contained in the part of contents is finished.
Further, the audit in step S4 includes one of a normal audit, an intelligent audit and a cross audit.
Further, the common auditing method comprises the following steps: and sending all marked data to one or more operation terminals, and examining the data one by the staff corresponding to each operation terminal.
Further, the intelligent auditing method comprises the following steps: judging whether the labeling result of all the labeled data is consistent with the pre-labeled result or not according to all the labeled data, and if so, judging that the examination and verification are passed; otherwise, the data with inconsistent results are sent to the operation terminal, and the staff corresponding to the operation terminal audits the data with inconsistent results.
Further, the cross-checking method comprises the following steps: and sending each marked data to at least two operation terminals, and auditing each data by at least two corresponding workers.
An intelligent labeling terminal device comprises a processor, a memory and a computer program which is stored in the memory and can run on the processor, wherein the processor executes the computer program to realize the steps of the method of the embodiment of the invention.
A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to an embodiment of the invention as described above.
By adopting the technical scheme, the invention not only realizes intelligent marking, but also improves the speed and the accuracy of marking through the processes of pre-marking, manually marking and auditing the data.
Drawings
Fig. 1 is a flowchart illustrating a first embodiment of the present invention.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.
The invention will now be further described with reference to the accompanying drawings and detailed description.
The first embodiment is as follows:
the embodiment of the invention provides an intelligent labeling method, as shown in fig. 1, the method comprises the following steps:
s1: and importing a file consisting of the data to be marked.
Because a data set for network model training is often large and the required data amount is large, files formed by the data set are often large, and in the importing process, if the file where the data is located is large, such as a file of a GB or TB grade, the traditional importing mode can have the problems of 'service response overtime', 'memory burst', and the like. In order to solve this problem, the following method is adopted in this embodiment:
(1) in the file uploading stage, a file is divided into a plurality of sub-blocks, one sub-block in the file is intercepted each time for transmission, the storage position of each sub-block is calculated according to the storage position of the file and the offset of the sub-block in the file, and each sub-block is stored in the corresponding position according to the calculated storage position.
(2) In the data reading stage, only a part of contents in the file is read each time, the position of the read part of contents in the file is recorded, and the next part of data is read after the data processing contained in the part of contents is finished.
The method can solve the problem of uploading the oversized file, and has the characteristics of quick response and low memory occupancy rate.
S2: all data in the file is pre-labeled according to a Natural Language Processing (NLP) algorithm.
One skilled in the art can customize the NLP model to pre-label the type of data. The pre-labels in this embodiment include, but are not limited to, part-of-speech labels, word segmentation labels, named entity labels, dependency relationship labels, entity relationship labels, text classification labels, emotion classification labels, and the like.
S3: dividing all the pre-marked data into a plurality of sub-data sets, and sending the data contained in each sub-data set to different operation terminals.
In this embodiment, the different operation terminals are computing terminals used by different workers, and the computing terminals may be a PC, a mobile phone, a palm computer, and the like, which is not limited herein. And after receiving the pre-marked data on the computing terminal used by each worker, manually marking the data. And submitting the result after manual marking after marking.
By segmenting the mass data, the efficiency of annotation can be improved.
S4: and after receiving the marked data returned by all the operation terminals, sending the data to the operation terminals for auditing.
The audit in this embodiment includes one of a normal audit, an intelligent audit, and a cross audit.
The common auditing method comprises the following steps: and sending all marked data to one or more operation terminals, and examining the data one by the staff corresponding to each operation terminal.
The intelligent auditing method comprises the following steps: judging whether the labeling result of all the labeled data is consistent with the pre-labeled result or not according to all the labeled data, and if so, judging that the examination and verification are passed; otherwise, the data with inconsistent results are sent to the operation terminal, and the staff corresponding to the operation terminal audits the data with inconsistent results.
The cross auditing method comprises the following steps: and sending each marked data to at least two operation terminals, and auditing each data by at least two corresponding workers.
S5: and when all the auditing results are received, generating a label file corresponding to each audited data.
The format of the markup file generated in this embodiment is JSON or XML format.
According to the embodiment of the invention, through the processes of pre-labeling, manual labeling and auditing the data, not only is intelligent labeling realized, but also the labeling speed and accuracy are improved.
Example two:
the invention also provides an intelligent labeling terminal device, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the steps in the method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.
Further, as an executable scheme, the intelligent labeling terminal device may be a computing device such as a PC, a mobile phone, a palm computer, and a cloud server. The intelligent labeling terminal equipment can comprise, but is not limited to, a processor and a memory. It is understood by those skilled in the art that the above-mentioned composition structure of the intelligent annotation terminal device is only an example of the intelligent annotation terminal device, and does not constitute a limitation to the intelligent annotation terminal device, and may include more or less components than the above, or combine some components, or different components, for example, the intelligent annotation terminal device may further include an input/output device, a network access device, a bus, etc., which is not limited by the embodiment of the present invention.
Further, as an executable solution, the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general processor can be a microprocessor or the processor can be any conventional processor, and the processor is a control center of the intelligent labeling terminal device and is connected with various parts of the whole intelligent labeling terminal device by various interfaces and lines.
The memory can be used for storing the computer program and/or the module, and the processor can realize various functions of the intelligent labeling terminal device by running or executing the computer program and/or the module stored in the memory and calling the data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.
The module/unit integrated with the intelligent labeling terminal device can be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. An intelligent labeling method is characterized by comprising the following steps:
s1: importing a file consisting of data to be marked;
s2: pre-labeling all data in the file according to a natural language processing algorithm;
s3: dividing all the pre-labeled data into a plurality of sub-data sets, and sending the data contained in each sub-data set to different operation terminals;
s4: after receiving the marked data returned by all the operation terminals, sending the data to the operation terminals for auditing;
s5: and when all the auditing results are received, generating a label file corresponding to each audited data.
2. The intelligent labeling method of claim 1, wherein: the importing process comprises the following steps: in the file uploading stage, dividing a file into a plurality of sub-blocks, intercepting one sub-block in the file for transmission each time, calculating the storage position of each sub-block according to the storage position of the file and the offset of the sub-block in the file, and storing each sub-block at a corresponding position according to the calculated storage position;
in the data reading stage, only a part of contents in the file is read each time, the position of the read part of contents in the file is recorded, and the next part of data is read after the data processing contained in the part of contents is finished.
3. The intelligent labeling method of claim 1, wherein: the audit in step S4 includes one of normal audit, intelligent audit and cross audit.
4. The intelligent labeling method of claim 3, wherein: the common auditing method comprises the following steps: and sending all marked data to one or more operation terminals, and examining the data one by the staff corresponding to each operation terminal.
5. The intelligent labeling method of claim 3, wherein: the intelligent auditing method comprises the following steps: judging whether the labeling result of all the labeled data is consistent with the pre-labeled result or not according to all the labeled data, and if so, judging that the examination and verification are passed; otherwise, the data with inconsistent results are sent to the operation terminal, and the staff corresponding to the operation terminal audits the data with inconsistent results.
6. The intelligent labeling method of claim 3, wherein: the cross auditing method comprises the following steps: and sending each marked data to at least two operation terminals, and auditing each data by at least two corresponding workers.
7. The utility model provides an intelligence mark terminal equipment which characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, the processor implementing the steps of the method according to any one of claims 1 to 6 when executing the computer program.
8. A computer-readable storage medium storing a computer program, characterized in that: the computer program when executed by a processor implementing the steps of the method as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010773242.XA CN111881657A (en) | 2020-08-04 | 2020-08-04 | Intelligent marking method, terminal equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010773242.XA CN111881657A (en) | 2020-08-04 | 2020-08-04 | Intelligent marking method, terminal equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111881657A true CN111881657A (en) | 2020-11-03 |
Family
ID=73211771
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010773242.XA Pending CN111881657A (en) | 2020-08-04 | 2020-08-04 | Intelligent marking method, terminal equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111881657A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906349A (en) * | 2021-03-30 | 2021-06-04 | 苏州大学 | Data annotation method, system, equipment and readable storage medium |
CN113569546A (en) * | 2021-06-16 | 2021-10-29 | 上海淇玥信息技术有限公司 | Intention labeling method and device and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120323925A1 (en) * | 2011-06-17 | 2012-12-20 | Fitzsimmons Jeffrey E | System and Method for Synchronously Generating An Index to a Media Stream |
CN103685343A (en) * | 2012-09-03 | 2014-03-26 | 腾讯科技(深圳)有限公司 | File transfer method and file transfer system |
CN107705034A (en) * | 2017-10-26 | 2018-02-16 | 医渡云(北京)技术有限公司 | Mass-rent platform implementation method and device, storage medium and electronic equipment |
CN110245716A (en) * | 2019-06-20 | 2019-09-17 | 杭州睿琪软件有限公司 | Sample labeling auditing method and device |
CN110297914A (en) * | 2019-06-14 | 2019-10-01 | 中译语通科技股份有限公司 | Corpus labeling method and device |
-
2020
- 2020-08-04 CN CN202010773242.XA patent/CN111881657A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120323925A1 (en) * | 2011-06-17 | 2012-12-20 | Fitzsimmons Jeffrey E | System and Method for Synchronously Generating An Index to a Media Stream |
CN103685343A (en) * | 2012-09-03 | 2014-03-26 | 腾讯科技(深圳)有限公司 | File transfer method and file transfer system |
CN107705034A (en) * | 2017-10-26 | 2018-02-16 | 医渡云(北京)技术有限公司 | Mass-rent platform implementation method and device, storage medium and electronic equipment |
CN110297914A (en) * | 2019-06-14 | 2019-10-01 | 中译语通科技股份有限公司 | Corpus labeling method and device |
CN110245716A (en) * | 2019-06-20 | 2019-09-17 | 杭州睿琪软件有限公司 | Sample labeling auditing method and device |
Non-Patent Citations (1)
Title |
---|
阎晓青 等: "《Sybase Open Client应用开发指南》", 28 February 1998 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906349A (en) * | 2021-03-30 | 2021-06-04 | 苏州大学 | Data annotation method, system, equipment and readable storage medium |
CN113569546A (en) * | 2021-06-16 | 2021-10-29 | 上海淇玥信息技术有限公司 | Intention labeling method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111274782A (en) | Text auditing method and device, computer equipment and readable storage medium | |
US20210049711A1 (en) | Method of automatically transmitting data information and device of automatically transmitting data information | |
CN108876213B (en) | Block chain-based product management method, device, medium and electronic equipment | |
CN110427487B (en) | Data labeling method and device and storage medium | |
CN105095755A (en) | File recognition method and apparatus | |
CN109388675A (en) | Data analysing method, device, computer equipment and storage medium | |
CN108376364B (en) | Payment system account checking method and device and terminal device | |
CN108768929A (en) | The analytic method and storage medium of electronic device, reference feedback message | |
CN111881657A (en) | Intelligent marking method, terminal equipment and storage medium | |
CN113746758A (en) | Method and terminal for dynamically identifying flow protocol | |
US20210357251A1 (en) | Electronic device and non-transitory storage medium implementing test path coordination method | |
CN114240101A (en) | Risk identification model verification method, device and equipment | |
CN112434884A (en) | Method and device for establishing supplier classified portrait | |
CN113590102A (en) | Zero-code rapid software development method, system, medium and equipment | |
CN112434970A (en) | Qualification data verification method and device based on intelligent data acquisition | |
CN111143434A (en) | Intelligent data checking method, device, equipment and storage medium | |
CN110633251B (en) | File conversion method and equipment | |
CN113268665A (en) | Information recommendation method, device and equipment based on random forest and storage medium | |
CN110110295B (en) | Large sample research and report information extraction method, device, equipment and storage medium | |
CN110377891B (en) | Method, device and equipment for generating event analysis article and computer readable storage medium | |
CN116775575A (en) | File merging method and device, electronic equipment and storage medium | |
CN114338850B (en) | Message checking method, device, terminal equipment and computer readable storage medium | |
CN110659190A (en) | Quality report generation method, quality report generation device, quality report generation equipment and computer readable storage medium | |
CN109491622B (en) | Printing method applied to transaction terminal and transaction terminal | |
CN113343663A (en) | Bill structuring method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201103 |
|
RJ01 | Rejection of invention patent application after publication |