CN111881657A - Intelligent marking method, terminal equipment and storage medium - Google Patents

Intelligent marking method, terminal equipment and storage medium Download PDF

Info

Publication number
CN111881657A
CN111881657A CN202010773242.XA CN202010773242A CN111881657A CN 111881657 A CN111881657 A CN 111881657A CN 202010773242 A CN202010773242 A CN 202010773242A CN 111881657 A CN111881657 A CN 111881657A
Authority
CN
China
Prior art keywords
data
file
sub
intelligent
auditing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010773242.XA
Other languages
Chinese (zh)
Inventor
洪万福
钱智毅
李世贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yuanting Information Technology Co ltd
Original Assignee
Xiamen Yuanting Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yuanting Information Technology Co ltd filed Critical Xiamen Yuanting Information Technology Co ltd
Priority to CN202010773242.XA priority Critical patent/CN111881657A/en
Publication of CN111881657A publication Critical patent/CN111881657A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes

Abstract

The invention relates to an intelligent labeling method, a terminal device and a storage medium, wherein the method comprises the following steps: s1: importing a file consisting of data to be marked; s2: pre-labeling all data in the file according to a natural language processing algorithm; s3: dividing all the pre-labeled data into a plurality of sub-data sets, and sending the data contained in each sub-data set to different operation terminals; s4: after receiving the marked data returned by all the operation terminals, sending the data to the operation terminals for auditing; s5: and when all the auditing results are received, generating a label file corresponding to each audited data. According to the invention, through the processes of pre-labeling, manual labeling and auditing the data, not only is intelligent labeling realized, but also the labeling speed and accuracy are improved.

Description

Intelligent marking method, terminal equipment and storage medium
Technical Field
The invention relates to the field of data annotation, in particular to an intelligent annotation method, terminal equipment and a storage medium.
Background
Currently, data annotation is the most time-consuming and labor-consuming link in the initial stage of artificial intelligence, and is also the link that artificial intelligence cannot skip at the present stage. And (3) obtaining training set data of the model by marking the sample, realizing training of the model by the training set data, and finally realizing artificial intelligence. Therefore, the cost of the whole artificial intelligence is greatly reduced by realizing quick labeling and intelligent labeling, and meanwhile, the quick development of the artificial intelligence is promoted.
Disclosure of Invention
In order to solve the above problems, the present invention provides an intelligent labeling method, a terminal device and a storage medium.
The specific scheme is as follows:
an intelligent labeling method comprises the following steps:
s1: importing a file consisting of data to be marked;
s2: pre-labeling all data in the file according to a natural language processing algorithm;
s3: dividing all the pre-labeled data into a plurality of sub-data sets, and sending the data contained in each sub-data set to different operation terminals;
s4: after receiving the marked data returned by all the operation terminals, sending the data to the operation terminals for auditing;
s5: and when all the auditing results are received, generating a label file corresponding to each audited data.
Further, the importing process includes: in the file uploading stage, dividing a file into a plurality of sub-blocks, intercepting one sub-block in the file for transmission each time, calculating the storage position of each sub-block according to the storage position of the file and the offset of the sub-block in the file, and storing each sub-block at a corresponding position according to the calculated storage position;
in the data reading stage, only a part of contents in the file is read each time, the position of the read part of contents in the file is recorded, and the next part of data is read after the data processing contained in the part of contents is finished.
Further, the audit in step S4 includes one of a normal audit, an intelligent audit and a cross audit.
Further, the common auditing method comprises the following steps: and sending all marked data to one or more operation terminals, and examining the data one by the staff corresponding to each operation terminal.
Further, the intelligent auditing method comprises the following steps: judging whether the labeling result of all the labeled data is consistent with the pre-labeled result or not according to all the labeled data, and if so, judging that the examination and verification are passed; otherwise, the data with inconsistent results are sent to the operation terminal, and the staff corresponding to the operation terminal audits the data with inconsistent results.
Further, the cross-checking method comprises the following steps: and sending each marked data to at least two operation terminals, and auditing each data by at least two corresponding workers.
An intelligent labeling terminal device comprises a processor, a memory and a computer program which is stored in the memory and can run on the processor, wherein the processor executes the computer program to realize the steps of the method of the embodiment of the invention.
A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to an embodiment of the invention as described above.
By adopting the technical scheme, the invention not only realizes intelligent marking, but also improves the speed and the accuracy of marking through the processes of pre-marking, manually marking and auditing the data.
Drawings
Fig. 1 is a flowchart illustrating a first embodiment of the present invention.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.
The invention will now be further described with reference to the accompanying drawings and detailed description.
The first embodiment is as follows:
the embodiment of the invention provides an intelligent labeling method, as shown in fig. 1, the method comprises the following steps:
s1: and importing a file consisting of the data to be marked.
Because a data set for network model training is often large and the required data amount is large, files formed by the data set are often large, and in the importing process, if the file where the data is located is large, such as a file of a GB or TB grade, the traditional importing mode can have the problems of 'service response overtime', 'memory burst', and the like. In order to solve this problem, the following method is adopted in this embodiment:
(1) in the file uploading stage, a file is divided into a plurality of sub-blocks, one sub-block in the file is intercepted each time for transmission, the storage position of each sub-block is calculated according to the storage position of the file and the offset of the sub-block in the file, and each sub-block is stored in the corresponding position according to the calculated storage position.
(2) In the data reading stage, only a part of contents in the file is read each time, the position of the read part of contents in the file is recorded, and the next part of data is read after the data processing contained in the part of contents is finished.
The method can solve the problem of uploading the oversized file, and has the characteristics of quick response and low memory occupancy rate.
S2: all data in the file is pre-labeled according to a Natural Language Processing (NLP) algorithm.
One skilled in the art can customize the NLP model to pre-label the type of data. The pre-labels in this embodiment include, but are not limited to, part-of-speech labels, word segmentation labels, named entity labels, dependency relationship labels, entity relationship labels, text classification labels, emotion classification labels, and the like.
S3: dividing all the pre-marked data into a plurality of sub-data sets, and sending the data contained in each sub-data set to different operation terminals.
In this embodiment, the different operation terminals are computing terminals used by different workers, and the computing terminals may be a PC, a mobile phone, a palm computer, and the like, which is not limited herein. And after receiving the pre-marked data on the computing terminal used by each worker, manually marking the data. And submitting the result after manual marking after marking.
By segmenting the mass data, the efficiency of annotation can be improved.
S4: and after receiving the marked data returned by all the operation terminals, sending the data to the operation terminals for auditing.
The audit in this embodiment includes one of a normal audit, an intelligent audit, and a cross audit.
The common auditing method comprises the following steps: and sending all marked data to one or more operation terminals, and examining the data one by the staff corresponding to each operation terminal.
The intelligent auditing method comprises the following steps: judging whether the labeling result of all the labeled data is consistent with the pre-labeled result or not according to all the labeled data, and if so, judging that the examination and verification are passed; otherwise, the data with inconsistent results are sent to the operation terminal, and the staff corresponding to the operation terminal audits the data with inconsistent results.
The cross auditing method comprises the following steps: and sending each marked data to at least two operation terminals, and auditing each data by at least two corresponding workers.
S5: and when all the auditing results are received, generating a label file corresponding to each audited data.
The format of the markup file generated in this embodiment is JSON or XML format.
According to the embodiment of the invention, through the processes of pre-labeling, manual labeling and auditing the data, not only is intelligent labeling realized, but also the labeling speed and accuracy are improved.
Example two:
the invention also provides an intelligent labeling terminal device, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the steps in the method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.
Further, as an executable scheme, the intelligent labeling terminal device may be a computing device such as a PC, a mobile phone, a palm computer, and a cloud server. The intelligent labeling terminal equipment can comprise, but is not limited to, a processor and a memory. It is understood by those skilled in the art that the above-mentioned composition structure of the intelligent annotation terminal device is only an example of the intelligent annotation terminal device, and does not constitute a limitation to the intelligent annotation terminal device, and may include more or less components than the above, or combine some components, or different components, for example, the intelligent annotation terminal device may further include an input/output device, a network access device, a bus, etc., which is not limited by the embodiment of the present invention.
Further, as an executable solution, the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general processor can be a microprocessor or the processor can be any conventional processor, and the processor is a control center of the intelligent labeling terminal device and is connected with various parts of the whole intelligent labeling terminal device by various interfaces and lines.
The memory can be used for storing the computer program and/or the module, and the processor can realize various functions of the intelligent labeling terminal device by running or executing the computer program and/or the module stored in the memory and calling the data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.
The module/unit integrated with the intelligent labeling terminal device can be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. An intelligent labeling method is characterized by comprising the following steps:
s1: importing a file consisting of data to be marked;
s2: pre-labeling all data in the file according to a natural language processing algorithm;
s3: dividing all the pre-labeled data into a plurality of sub-data sets, and sending the data contained in each sub-data set to different operation terminals;
s4: after receiving the marked data returned by all the operation terminals, sending the data to the operation terminals for auditing;
s5: and when all the auditing results are received, generating a label file corresponding to each audited data.
2. The intelligent labeling method of claim 1, wherein: the importing process comprises the following steps: in the file uploading stage, dividing a file into a plurality of sub-blocks, intercepting one sub-block in the file for transmission each time, calculating the storage position of each sub-block according to the storage position of the file and the offset of the sub-block in the file, and storing each sub-block at a corresponding position according to the calculated storage position;
in the data reading stage, only a part of contents in the file is read each time, the position of the read part of contents in the file is recorded, and the next part of data is read after the data processing contained in the part of contents is finished.
3. The intelligent labeling method of claim 1, wherein: the audit in step S4 includes one of normal audit, intelligent audit and cross audit.
4. The intelligent labeling method of claim 3, wherein: the common auditing method comprises the following steps: and sending all marked data to one or more operation terminals, and examining the data one by the staff corresponding to each operation terminal.
5. The intelligent labeling method of claim 3, wherein: the intelligent auditing method comprises the following steps: judging whether the labeling result of all the labeled data is consistent with the pre-labeled result or not according to all the labeled data, and if so, judging that the examination and verification are passed; otherwise, the data with inconsistent results are sent to the operation terminal, and the staff corresponding to the operation terminal audits the data with inconsistent results.
6. The intelligent labeling method of claim 3, wherein: the cross auditing method comprises the following steps: and sending each marked data to at least two operation terminals, and auditing each data by at least two corresponding workers.
7. The utility model provides an intelligence mark terminal equipment which characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, the processor implementing the steps of the method according to any one of claims 1 to 6 when executing the computer program.
8. A computer-readable storage medium storing a computer program, characterized in that: the computer program when executed by a processor implementing the steps of the method as claimed in any one of claims 1 to 6.
CN202010773242.XA 2020-08-04 2020-08-04 Intelligent marking method, terminal equipment and storage medium Pending CN111881657A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010773242.XA CN111881657A (en) 2020-08-04 2020-08-04 Intelligent marking method, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010773242.XA CN111881657A (en) 2020-08-04 2020-08-04 Intelligent marking method, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111881657A true CN111881657A (en) 2020-11-03

Family

ID=73211771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010773242.XA Pending CN111881657A (en) 2020-08-04 2020-08-04 Intelligent marking method, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111881657A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906349A (en) * 2021-03-30 2021-06-04 苏州大学 Data annotation method, system, equipment and readable storage medium
CN113569546A (en) * 2021-06-16 2021-10-29 上海淇玥信息技术有限公司 Intention labeling method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323925A1 (en) * 2011-06-17 2012-12-20 Fitzsimmons Jeffrey E System and Method for Synchronously Generating An Index to a Media Stream
CN103685343A (en) * 2012-09-03 2014-03-26 腾讯科技(深圳)有限公司 File transfer method and file transfer system
CN107705034A (en) * 2017-10-26 2018-02-16 医渡云(北京)技术有限公司 Mass-rent platform implementation method and device, storage medium and electronic equipment
CN110245716A (en) * 2019-06-20 2019-09-17 杭州睿琪软件有限公司 Sample labeling auditing method and device
CN110297914A (en) * 2019-06-14 2019-10-01 中译语通科技股份有限公司 Corpus labeling method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323925A1 (en) * 2011-06-17 2012-12-20 Fitzsimmons Jeffrey E System and Method for Synchronously Generating An Index to a Media Stream
CN103685343A (en) * 2012-09-03 2014-03-26 腾讯科技(深圳)有限公司 File transfer method and file transfer system
CN107705034A (en) * 2017-10-26 2018-02-16 医渡云(北京)技术有限公司 Mass-rent platform implementation method and device, storage medium and electronic equipment
CN110297914A (en) * 2019-06-14 2019-10-01 中译语通科技股份有限公司 Corpus labeling method and device
CN110245716A (en) * 2019-06-20 2019-09-17 杭州睿琪软件有限公司 Sample labeling auditing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
阎晓青 等: "《Sybase Open Client应用开发指南》", 28 February 1998 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906349A (en) * 2021-03-30 2021-06-04 苏州大学 Data annotation method, system, equipment and readable storage medium
CN113569546A (en) * 2021-06-16 2021-10-29 上海淇玥信息技术有限公司 Intention labeling method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN111274782A (en) Text auditing method and device, computer equipment and readable storage medium
US20210049711A1 (en) Method of automatically transmitting data information and device of automatically transmitting data information
CN108876213B (en) Block chain-based product management method, device, medium and electronic equipment
CN110427487B (en) Data labeling method and device and storage medium
CN105095755A (en) File recognition method and apparatus
CN109388675A (en) Data analysing method, device, computer equipment and storage medium
CN108376364B (en) Payment system account checking method and device and terminal device
CN108768929A (en) The analytic method and storage medium of electronic device, reference feedback message
CN111881657A (en) Intelligent marking method, terminal equipment and storage medium
CN113746758A (en) Method and terminal for dynamically identifying flow protocol
US20210357251A1 (en) Electronic device and non-transitory storage medium implementing test path coordination method
CN114240101A (en) Risk identification model verification method, device and equipment
CN112434884A (en) Method and device for establishing supplier classified portrait
CN113590102A (en) Zero-code rapid software development method, system, medium and equipment
CN112434970A (en) Qualification data verification method and device based on intelligent data acquisition
CN111143434A (en) Intelligent data checking method, device, equipment and storage medium
CN110633251B (en) File conversion method and equipment
CN113268665A (en) Information recommendation method, device and equipment based on random forest and storage medium
CN110110295B (en) Large sample research and report information extraction method, device, equipment and storage medium
CN110377891B (en) Method, device and equipment for generating event analysis article and computer readable storage medium
CN116775575A (en) File merging method and device, electronic equipment and storage medium
CN114338850B (en) Message checking method, device, terminal equipment and computer readable storage medium
CN110659190A (en) Quality report generation method, quality report generation device, quality report generation equipment and computer readable storage medium
CN109491622B (en) Printing method applied to transaction terminal and transaction terminal
CN113343663A (en) Bill structuring method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201103

RJ01 Rejection of invention patent application after publication