CN113761044A - Labeling system method for labeling text into table - Google Patents
Labeling system method for labeling text into table Download PDFInfo
- Publication number
- CN113761044A CN113761044A CN202111001283.8A CN202111001283A CN113761044A CN 113761044 A CN113761044 A CN 113761044A CN 202111001283 A CN202111001283 A CN 202111001283A CN 113761044 A CN113761044 A CN 113761044A
- Authority
- CN
- China
- Prior art keywords
- data
- labeling
- text
- label
- constructing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000002452 interceptive effect Effects 0.000 claims abstract description 15
- 230000003993 interaction Effects 0.000 claims description 3
- 241000208125 Nicotiana Species 0.000 claims 1
- 235000002637 Nicotiana tabacum Nutrition 0.000 claims 1
- 235000019504 cigarettes Nutrition 0.000 claims 1
- 230000008676 import Effects 0.000 abstract 1
- 238000000605 extraction Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/383—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/174—Form filling; Merging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Library & Information Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Document Processing Apparatus (AREA)
Abstract
A marking system method for marking texts into tables relates to a marking method. The method comprises the steps of importing text data and labels to be labeled into a background database through an import data module and a front-end interactive interface, adopting a two-step labeling method, adopting a traditional series labeling method to label core elements in a text, determining the initial positions, element contents and label information of the elements, transmitting results into the background database, returning the content, index IDs and labels of the elements to the front-end interactive interface, and adopting a structured labeling method to generate a table list based on the returned element contents and labels by checking and adding the determined interactive mode, associating each line of data through the index IDs again, displaying the data into a table form, and importing the data into the database to finish a text labeling task if the table data is determined to be correct. Compared with Excel, the invention greatly improves the labeling efficiency and reduces the error condition.
Description
Technical Field
The invention relates to the technical field of text standards, in particular to a labeling system method for labeling texts into tables.
Background
Artificial intelligence technology has been widely used in various industries of science and technology. In the field of AI, algorithms often require a large amount of labeled data for learning of algorithm models. For the text processing (NLP) direction, the main tasks of labeling include text classification, information extraction, etc. to label unstructured or unlabeled data into structured data or to label the structured data. With the deep development of the algorithm, more and more detailed research subjects and directions exist; there are also more and more types of labeling requirements. In the financial field, a class of detailed text labeling requirements exist, namely labeling requirements for labeling texts into structured tables, such as 2e, 12bp and 2.5kw of 7D in the case of capital transaction texts [ 2e, 12bp and 7D in the evening ], and 20 points are respectively given with 50% discount of 352378 IB 4 e; the 5000 ten thousand 50% discount, thank you) is two fund trading order elements that need to be labeled in the following structured form. How to accurately and effectively label the data is crucial to downstream business logic.
The current information extraction is a relatively mature task in the field of natural language processing, and a relatively mature sequence labeling scheme is also provided; however, there is no mature solution to label text directly into a structured tabular form. The method mainly used is that the marked elements are copied into software such as a table or Excel and the like by text sequence marking and then copying and pasting, and then the marked elements are arranged. However, the existing information extraction technology has the following disadvantages that firstly, the labeling efficiency is low, and a labeling person firstly needs to perform serial labeling to label the labeled element information in the text and then perform copying and pasting work. The two steps have repetitive work, so that the labeling quantity is increased, and the labeling efficiency is reduced. Secondly, errors are easy to occur, because the labeling is a label with two subtasks fused, but the extraction of repeated elements easily causes the problem that the boundaries of the same element are not uniform under the condition of large labeling requirements, and the data quality of the label is reduced.
Disclosure of Invention
The invention aims to provide a labeling system method for labeling a text into a form, which realizes the task of directly labeling data into the form by a two-step labeling mode, greatly improves the labeling efficiency and reduces the error condition based on the system compared with Excel.
In order to achieve the purpose, the invention adopts the following technical scheme: a method of a labeling system for labeling text into a form, comprising the steps of:
step S1, importing data, and constructing interaction modules of a front end, a back end and a database;
step S2, labeling the text series, labeling the core elements in the text;
step S3, structured labeling, based on the returned element content and label, through checking and adding the determined interactive mode, generating a table list;
step S, table display, namely performing association of data of each row through the data information established in the step S1 to display the data into a table form;
and step S5, storing the data, constructing a storage button, clicking to write the labeled structured data into the background database, and completing the full-process labeling task of one text.
Further, the front end in step S1 includes a text file importing module and a label importing module, and the back end is configured to obtain text and label data, establish a unique data ID for each piece of data, serve as a data index ID, and associate with the label data.
Further, the text series labeling in step 2 includes the following steps:
determining the initial position, the element content and the label information of the element;
and step two, transmitting the label information in the step one into a background database, and returning the content, the index ID and the label of the element to the front-end interactive interface.
Further, the structured labeling in step 3 includes the following steps:
step one, constructing an entity content display module;
step two, constructing a table row generating module and constructing a generating button;
and step three, constructing a buffer module, and storing the complete information of each entity according to a table structure.
The working principle of the invention is as follows:
the invention constructs a labeling system method for directly labeling texts into forms, which comprises the steps of firstly, importing a data module, importing text data and labels to be labeled into a background database through a front-end interactive interface, secondly, adopting a two-step labeling method during labeling, firstly, adopting a traditional series labeling method, labeling core elements in the texts, determining the initial positions of the elements, the element contents and the label information of the elements, transmitting the results into the background database, and simultaneously returning the content of the elements, index IDs (identities) and the labels to the front-end interactive interface. And the second step is structured labeling, based on the returned element content and the label, generating a table list by checking and adding a determined interactive mode. Defining the elements selected each time as the same table data; repeating for multiple times to generate multiple rows of table information. And associating each line of data through the index ID again, displaying the data into a table form, and importing the table data into the database if the table data is determined to be correct to finish the labeling task of one text.
After the technical scheme is adopted, the invention has the beneficial effects that:
1. the marking method adopted by the invention adopts a mode of combining the traditional series marking method and the structural marking method, realizes the task of directly marking the data into the form by the text in a two-step marking mode, and greatly improves the marking efficiency and reduces the error condition based on the system and compared with Excel;
2. the invention is not limited to the labeling of financial data, and can be expanded to any task needing to label texts into forms.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a system set up diagram of the present invention.
Detailed Description
Referring to fig. 1, the technical solution adopted by the present embodiment is: it comprises the following steps:
step S1, importing data, and constructing interaction modules of a front end, a back end and a database;
step S2, labeling the text series, labeling the core elements in the text;
step S3, structured labeling, based on the returned element content and label, through checking and adding the determined interactive mode, generating a table list;
and S, performing table display, namely performing association of data of each row through the data information established in the step S1 to display the data into a table form, specifically, pulling the data from the cache to perform the table display, adding a row of operation buttons, and performing deletion operation on the information of each row. And clicking a delete button, deleting all data of the row from the cache library by the background, and updating the front-end display. (ii) a
And step S5, storing the data, constructing a storage button, clicking to write the labeled structured data into the background database, and completing the full-process labeling task of one text.
Further, the front end in step S1 includes a text file importing module and a label importing module, and the back end is configured to obtain text and label data, establish a unique data ID for each piece of data, serve as a data index ID, and associate with the label data.
Further, the text series labeling in step 2 includes the following steps:
step one, determining the initial position, the element content and the label information of the element, specifically, constructing a text display module at the front end, selectively editing the text, pulling the text one by one from the rear end, displaying the pulled text, acquiring the position of a cursor clicked by a user by monitoring a mouse click signal, calculating the text position of the cursor as an initial value S1, then acquiring a signal for releasing the click after pulling the mouse, acquiring the cursor position and calculating the text position of the cursor as an end value S2, calculating based on S1 and S2 values, extracting the text in the interval as labeled entity (entry) information, monitoring keyboard characters input by the user, and matching and acquiring a corresponding label (label) based on the character information;
and step two, transmitting the label information in the step one into a background database, returning the content of the elements, the index ID and the label to a front-end interactive interface, constructing a buffer module, and storing the sequence labeling information of the text. The storage content is "start position S1, end position S2, entity content, entity tag".
Further, the structured labeling in step 3 includes the following steps:
step one, constructing an entity content display module, pulling marked entity content from a buffer library, and carrying out selectable display; the presentation state includes two states of "selected/unselected". (ii) a
And step two, constructing a table row generating module, constructing a generating button, and executing data generation after clicking. The generation logic is that new line of structure data is constructed based on the entity content selected by the selection module and the tag attribute of the entity in the cache by taking the tag as a column and all the selected entity contents as a line;
and step three, constructing a buffer module, and storing the complete information (including the starting positions S1 and S2, the entity content and the entity label) of each entity as a unit according to a table structure.
The embodiment constructs a set of labeling system method for directly labeling texts into forms, which comprises the steps of firstly, importing a data module, importing text data and labels to be labeled into a background database through a front-end interactive interface, secondly, during labeling, adopting a two-step labeling method, firstly, adopting a traditional series labeling method, labeling core elements in texts, determining initial positions, element contents and label information of the elements, transmitting results into the background database, and simultaneously returning the content, index IDs and the labels of the elements to the front-end interactive interface. And the second step is structured labeling, based on the returned element content and the label, generating a table list by checking and adding a determined interactive mode. Defining the elements selected each time as the same table data; repeating for multiple times to generate multiple rows of table information. And associating each line of data through the index ID again, displaying the data into a table form, and importing the table data into the database if the table data is determined to be correct to finish the labeling task of one text.
After adopting above-mentioned technical scheme, this embodiment beneficial effect does:
1. the marking method adopted by the invention adopts a mode of combining the traditional series marking method and the structural marking method, realizes the task of directly marking the data into the form by the text in a two-step marking mode, and greatly improves the marking efficiency and reduces the error condition based on the system and compared with Excel;
2. the embodiment is not limited to the labeling of the financial data, and can be expanded to any task needing to label the text into a form.
The above description is only for the purpose of illustrating the technical solutions of the present invention and not for the purpose of limiting the same, and other modifications or equivalent substitutions made by those skilled in the art to the technical solutions of the present invention should be covered within the scope of the claims of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (4)
1. A labeling system method for labeling texts into tables, which comprises a tobacco rod and a cigarette cartridge, is characterized in that: it comprises the following steps:
step S1, importing data, and constructing interaction modules of a front end, a back end and a database;
step S2, labeling the text series, labeling the core elements in the text;
step S3, structured labeling, based on the returned element content and label, through checking and adding the determined interactive mode, generating a table list;
step S4, displaying a table, wherein the data information established in the step S1 is used for associating data of each row and displaying the data into a table form;
and step S5, storing the data, constructing a storage button, clicking to write the labeled structured data into the background database, and completing the full-process labeling task of one text.
2. The method of claim 1, wherein the method comprises: the front end comprises a text file importing and label importing module, and the back end is used for acquiring text and label data, establishing a unique data ID for each piece of data, using the unique data ID as a data index ID, and associating the unique data ID with the label data.
3. The method of claim 1, wherein the method comprises: the text series labeling in the step 2 comprises the following steps:
determining the initial position, the element content and the label information of the element;
and step two, transmitting the label information in the step one into a background database, and returning the content, the index ID and the label of the element to the front-end interactive interface.
4. The method of claim 1, wherein the method comprises: the structural labeling in step 3 includes the following steps:
step one, constructing an entity content display module;
and step two, constructing a table row generating module and constructing a generating button.
And step three, constructing a buffer module, and storing the complete information of each entity according to a table structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111001283.8A CN113761044A (en) | 2021-08-30 | 2021-08-30 | Labeling system method for labeling text into table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111001283.8A CN113761044A (en) | 2021-08-30 | 2021-08-30 | Labeling system method for labeling text into table |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113761044A true CN113761044A (en) | 2021-12-07 |
Family
ID=78791780
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111001283.8A Pending CN113761044A (en) | 2021-08-30 | 2021-08-30 | Labeling system method for labeling text into table |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113761044A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543153A (en) * | 2018-11-13 | 2019-03-29 | 成都数联铭品科技有限公司 | A kind of sequence labelling system and method |
WO2019237540A1 (en) * | 2018-06-12 | 2019-12-19 | 平安科技(深圳)有限公司 | Method and device for acquiring financial data, terminal device, and medium |
WO2020108257A1 (en) * | 2018-11-28 | 2020-06-04 | 腾讯科技(深圳)有限公司 | Method and device for automatically splitting table content into columns, computer apparatus, and storage medium |
CN112883687A (en) * | 2021-02-05 | 2021-06-01 | 北京科技大学 | Law contract interactive labeling method based on contract text markup language |
CN113177124A (en) * | 2021-05-11 | 2021-07-27 | 北京邮电大学 | Vertical domain knowledge graph construction method and system |
-
2021
- 2021-08-30 CN CN202111001283.8A patent/CN113761044A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019237540A1 (en) * | 2018-06-12 | 2019-12-19 | 平安科技(深圳)有限公司 | Method and device for acquiring financial data, terminal device, and medium |
CN109543153A (en) * | 2018-11-13 | 2019-03-29 | 成都数联铭品科技有限公司 | A kind of sequence labelling system and method |
WO2020108257A1 (en) * | 2018-11-28 | 2020-06-04 | 腾讯科技(深圳)有限公司 | Method and device for automatically splitting table content into columns, computer apparatus, and storage medium |
CN112883687A (en) * | 2021-02-05 | 2021-06-01 | 北京科技大学 | Law contract interactive labeling method based on contract text markup language |
CN113177124A (en) * | 2021-05-11 | 2021-07-27 | 北京邮电大学 | Vertical domain knowledge graph construction method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11574201B2 (en) | Enhancing evolutionary optimization in uncertain environments by allocating evaluations via multi-armed bandit algorithms | |
US8959122B2 (en) | Data processing device | |
CN110825882A (en) | Knowledge graph-based information system management method | |
CN109165384A (en) | A kind of name entity recognition method and device | |
CN100444591C (en) | Method for acquiring front-page keyword and its application system | |
US20210357469A1 (en) | Method for evaluating knowledge content, electronic device and storage medium | |
CN109783539A (en) | Usage mining and its model building method, device and computer equipment | |
US8095575B1 (en) | Word processor data organization | |
US20220269853A1 (en) | Domain-specific language interpreter and interactive visual interface for rapid screening | |
CN114691831A (en) | Task-type intelligent automobile fault question-answering system based on knowledge graph | |
Zhang et al. | Effective subword segmentation for text comprehension | |
Chen et al. | Crossdata: Leveraging text-data connections for authoring data documents | |
CN116468010A (en) | Report generation method, device, terminal and storage medium | |
CN109710250A (en) | It is a kind of for constructing the visualization engine system and method for user interface | |
CN111191429A (en) | System and method for automatic filling of data table | |
CN114579104A (en) | Data analysis scene generation method, device, equipment and storage medium | |
CN109445794A (en) | A kind of page building method and device | |
Streit et al. | A spreadsheet approach to facilitate visualization of uncertainty in information | |
CN115309885A (en) | Knowledge graph construction, retrieval and visualization method and system for scientific and technological service | |
US20240037084A1 (en) | Method and apparatus for storing data | |
CN116304236A (en) | User portrait generation method and device, electronic equipment and storage medium | |
CN113761044A (en) | Labeling system method for labeling text into table | |
CN108766513B (en) | Intelligent health medical data structured processing system | |
Zhu-Tian et al. | CrossData: Leveraging Text-Data Connections for Authoring Data Documents | |
She et al. | An automatic page code generation method based on excel template and poi technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |