CN113761044A - Labeling system method for labeling text into table - Google Patents

Labeling system method for labeling text into table Download PDF

Info

Publication number
CN113761044A
CN113761044A CN202111001283.8A CN202111001283A CN113761044A CN 113761044 A CN113761044 A CN 113761044A CN 202111001283 A CN202111001283 A CN 202111001283A CN 113761044 A CN113761044 A CN 113761044A
Authority
CN
China
Prior art keywords
data
labeling
text
label
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111001283.8A
Other languages
Chinese (zh)
Inventor
杨育纯
周靖宇
钟淑仪
陈巧玲
符威
邹鸿岳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kuaique Information Technology Co ltd
Original Assignee
Shanghai Kuaique Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kuaique Information Technology Co ltd filed Critical Shanghai Kuaique Information Technology Co ltd
Priority to CN202111001283.8A priority Critical patent/CN113761044A/en
Publication of CN113761044A publication Critical patent/CN113761044A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A marking system method for marking texts into tables relates to a marking method. The method comprises the steps of importing text data and labels to be labeled into a background database through an import data module and a front-end interactive interface, adopting a two-step labeling method, adopting a traditional series labeling method to label core elements in a text, determining the initial positions, element contents and label information of the elements, transmitting results into the background database, returning the content, index IDs and labels of the elements to the front-end interactive interface, and adopting a structured labeling method to generate a table list based on the returned element contents and labels by checking and adding the determined interactive mode, associating each line of data through the index IDs again, displaying the data into a table form, and importing the data into the database to finish a text labeling task if the table data is determined to be correct. Compared with Excel, the invention greatly improves the labeling efficiency and reduces the error condition.

Description

Labeling system method for labeling text into table
Technical Field
The invention relates to the technical field of text standards, in particular to a labeling system method for labeling texts into tables.
Background
Artificial intelligence technology has been widely used in various industries of science and technology. In the field of AI, algorithms often require a large amount of labeled data for learning of algorithm models. For the text processing (NLP) direction, the main tasks of labeling include text classification, information extraction, etc. to label unstructured or unlabeled data into structured data or to label the structured data. With the deep development of the algorithm, more and more detailed research subjects and directions exist; there are also more and more types of labeling requirements. In the financial field, a class of detailed text labeling requirements exist, namely labeling requirements for labeling texts into structured tables, such as 2e, 12bp and 2.5kw of 7D in the case of capital transaction texts [ 2e, 12bp and 7D in the evening ], and 20 points are respectively given with 50% discount of 352378 IB 4 e; the 5000 ten thousand 50% discount, thank you) is two fund trading order elements that need to be labeled in the following structured form. How to accurately and effectively label the data is crucial to downstream business logic.
The current information extraction is a relatively mature task in the field of natural language processing, and a relatively mature sequence labeling scheme is also provided; however, there is no mature solution to label text directly into a structured tabular form. The method mainly used is that the marked elements are copied into software such as a table or Excel and the like by text sequence marking and then copying and pasting, and then the marked elements are arranged. However, the existing information extraction technology has the following disadvantages that firstly, the labeling efficiency is low, and a labeling person firstly needs to perform serial labeling to label the labeled element information in the text and then perform copying and pasting work. The two steps have repetitive work, so that the labeling quantity is increased, and the labeling efficiency is reduced. Secondly, errors are easy to occur, because the labeling is a label with two subtasks fused, but the extraction of repeated elements easily causes the problem that the boundaries of the same element are not uniform under the condition of large labeling requirements, and the data quality of the label is reduced.
Disclosure of Invention
The invention aims to provide a labeling system method for labeling a text into a form, which realizes the task of directly labeling data into the form by a two-step labeling mode, greatly improves the labeling efficiency and reduces the error condition based on the system compared with Excel.
In order to achieve the purpose, the invention adopts the following technical scheme: a method of a labeling system for labeling text into a form, comprising the steps of:
step S1, importing data, and constructing interaction modules of a front end, a back end and a database;
step S2, labeling the text series, labeling the core elements in the text;
step S3, structured labeling, based on the returned element content and label, through checking and adding the determined interactive mode, generating a table list;
step S, table display, namely performing association of data of each row through the data information established in the step S1 to display the data into a table form;
and step S5, storing the data, constructing a storage button, clicking to write the labeled structured data into the background database, and completing the full-process labeling task of one text.
Further, the front end in step S1 includes a text file importing module and a label importing module, and the back end is configured to obtain text and label data, establish a unique data ID for each piece of data, serve as a data index ID, and associate with the label data.
Further, the text series labeling in step 2 includes the following steps:
determining the initial position, the element content and the label information of the element;
and step two, transmitting the label information in the step one into a background database, and returning the content, the index ID and the label of the element to the front-end interactive interface.
Further, the structured labeling in step 3 includes the following steps:
step one, constructing an entity content display module;
step two, constructing a table row generating module and constructing a generating button;
and step three, constructing a buffer module, and storing the complete information of each entity according to a table structure.
The working principle of the invention is as follows:
the invention constructs a labeling system method for directly labeling texts into forms, which comprises the steps of firstly, importing a data module, importing text data and labels to be labeled into a background database through a front-end interactive interface, secondly, adopting a two-step labeling method during labeling, firstly, adopting a traditional series labeling method, labeling core elements in the texts, determining the initial positions of the elements, the element contents and the label information of the elements, transmitting the results into the background database, and simultaneously returning the content of the elements, index IDs (identities) and the labels to the front-end interactive interface. And the second step is structured labeling, based on the returned element content and the label, generating a table list by checking and adding a determined interactive mode. Defining the elements selected each time as the same table data; repeating for multiple times to generate multiple rows of table information. And associating each line of data through the index ID again, displaying the data into a table form, and importing the table data into the database if the table data is determined to be correct to finish the labeling task of one text.
After the technical scheme is adopted, the invention has the beneficial effects that:
1. the marking method adopted by the invention adopts a mode of combining the traditional series marking method and the structural marking method, realizes the task of directly marking the data into the form by the text in a two-step marking mode, and greatly improves the marking efficiency and reduces the error condition based on the system and compared with Excel;
2. the invention is not limited to the labeling of financial data, and can be expanded to any task needing to label texts into forms.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a system set up diagram of the present invention.
Detailed Description
Referring to fig. 1, the technical solution adopted by the present embodiment is: it comprises the following steps:
step S1, importing data, and constructing interaction modules of a front end, a back end and a database;
step S2, labeling the text series, labeling the core elements in the text;
step S3, structured labeling, based on the returned element content and label, through checking and adding the determined interactive mode, generating a table list;
and S, performing table display, namely performing association of data of each row through the data information established in the step S1 to display the data into a table form, specifically, pulling the data from the cache to perform the table display, adding a row of operation buttons, and performing deletion operation on the information of each row. And clicking a delete button, deleting all data of the row from the cache library by the background, and updating the front-end display. (ii) a
And step S5, storing the data, constructing a storage button, clicking to write the labeled structured data into the background database, and completing the full-process labeling task of one text.
Further, the front end in step S1 includes a text file importing module and a label importing module, and the back end is configured to obtain text and label data, establish a unique data ID for each piece of data, serve as a data index ID, and associate with the label data.
Further, the text series labeling in step 2 includes the following steps:
step one, determining the initial position, the element content and the label information of the element, specifically, constructing a text display module at the front end, selectively editing the text, pulling the text one by one from the rear end, displaying the pulled text, acquiring the position of a cursor clicked by a user by monitoring a mouse click signal, calculating the text position of the cursor as an initial value S1, then acquiring a signal for releasing the click after pulling the mouse, acquiring the cursor position and calculating the text position of the cursor as an end value S2, calculating based on S1 and S2 values, extracting the text in the interval as labeled entity (entry) information, monitoring keyboard characters input by the user, and matching and acquiring a corresponding label (label) based on the character information;
and step two, transmitting the label information in the step one into a background database, returning the content of the elements, the index ID and the label to a front-end interactive interface, constructing a buffer module, and storing the sequence labeling information of the text. The storage content is "start position S1, end position S2, entity content, entity tag".
Further, the structured labeling in step 3 includes the following steps:
step one, constructing an entity content display module, pulling marked entity content from a buffer library, and carrying out selectable display; the presentation state includes two states of "selected/unselected". (ii) a
And step two, constructing a table row generating module, constructing a generating button, and executing data generation after clicking. The generation logic is that new line of structure data is constructed based on the entity content selected by the selection module and the tag attribute of the entity in the cache by taking the tag as a column and all the selected entity contents as a line;
and step three, constructing a buffer module, and storing the complete information (including the starting positions S1 and S2, the entity content and the entity label) of each entity as a unit according to a table structure.
The embodiment constructs a set of labeling system method for directly labeling texts into forms, which comprises the steps of firstly, importing a data module, importing text data and labels to be labeled into a background database through a front-end interactive interface, secondly, during labeling, adopting a two-step labeling method, firstly, adopting a traditional series labeling method, labeling core elements in texts, determining initial positions, element contents and label information of the elements, transmitting results into the background database, and simultaneously returning the content, index IDs and the labels of the elements to the front-end interactive interface. And the second step is structured labeling, based on the returned element content and the label, generating a table list by checking and adding a determined interactive mode. Defining the elements selected each time as the same table data; repeating for multiple times to generate multiple rows of table information. And associating each line of data through the index ID again, displaying the data into a table form, and importing the table data into the database if the table data is determined to be correct to finish the labeling task of one text.
After adopting above-mentioned technical scheme, this embodiment beneficial effect does:
1. the marking method adopted by the invention adopts a mode of combining the traditional series marking method and the structural marking method, realizes the task of directly marking the data into the form by the text in a two-step marking mode, and greatly improves the marking efficiency and reduces the error condition based on the system and compared with Excel;
2. the embodiment is not limited to the labeling of the financial data, and can be expanded to any task needing to label the text into a form.
The above description is only for the purpose of illustrating the technical solutions of the present invention and not for the purpose of limiting the same, and other modifications or equivalent substitutions made by those skilled in the art to the technical solutions of the present invention should be covered within the scope of the claims of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (4)

1. A labeling system method for labeling texts into tables, which comprises a tobacco rod and a cigarette cartridge, is characterized in that: it comprises the following steps:
step S1, importing data, and constructing interaction modules of a front end, a back end and a database;
step S2, labeling the text series, labeling the core elements in the text;
step S3, structured labeling, based on the returned element content and label, through checking and adding the determined interactive mode, generating a table list;
step S4, displaying a table, wherein the data information established in the step S1 is used for associating data of each row and displaying the data into a table form;
and step S5, storing the data, constructing a storage button, clicking to write the labeled structured data into the background database, and completing the full-process labeling task of one text.
2. The method of claim 1, wherein the method comprises: the front end comprises a text file importing and label importing module, and the back end is used for acquiring text and label data, establishing a unique data ID for each piece of data, using the unique data ID as a data index ID, and associating the unique data ID with the label data.
3. The method of claim 1, wherein the method comprises: the text series labeling in the step 2 comprises the following steps:
determining the initial position, the element content and the label information of the element;
and step two, transmitting the label information in the step one into a background database, and returning the content, the index ID and the label of the element to the front-end interactive interface.
4. The method of claim 1, wherein the method comprises: the structural labeling in step 3 includes the following steps:
step one, constructing an entity content display module;
and step two, constructing a table row generating module and constructing a generating button.
And step three, constructing a buffer module, and storing the complete information of each entity according to a table structure.
CN202111001283.8A 2021-08-30 2021-08-30 Labeling system method for labeling text into table Pending CN113761044A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111001283.8A CN113761044A (en) 2021-08-30 2021-08-30 Labeling system method for labeling text into table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111001283.8A CN113761044A (en) 2021-08-30 2021-08-30 Labeling system method for labeling text into table

Publications (1)

Publication Number Publication Date
CN113761044A true CN113761044A (en) 2021-12-07

Family

ID=78791780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111001283.8A Pending CN113761044A (en) 2021-08-30 2021-08-30 Labeling system method for labeling text into table

Country Status (1)

Country Link
CN (1) CN113761044A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543153A (en) * 2018-11-13 2019-03-29 成都数联铭品科技有限公司 A kind of sequence labelling system and method
WO2019237540A1 (en) * 2018-06-12 2019-12-19 平安科技(深圳)有限公司 Method and device for acquiring financial data, terminal device, and medium
WO2020108257A1 (en) * 2018-11-28 2020-06-04 腾讯科技(深圳)有限公司 Method and device for automatically splitting table content into columns, computer apparatus, and storage medium
CN112883687A (en) * 2021-02-05 2021-06-01 北京科技大学 Law contract interactive labeling method based on contract text markup language
CN113177124A (en) * 2021-05-11 2021-07-27 北京邮电大学 Vertical domain knowledge graph construction method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019237540A1 (en) * 2018-06-12 2019-12-19 平安科技(深圳)有限公司 Method and device for acquiring financial data, terminal device, and medium
CN109543153A (en) * 2018-11-13 2019-03-29 成都数联铭品科技有限公司 A kind of sequence labelling system and method
WO2020108257A1 (en) * 2018-11-28 2020-06-04 腾讯科技(深圳)有限公司 Method and device for automatically splitting table content into columns, computer apparatus, and storage medium
CN112883687A (en) * 2021-02-05 2021-06-01 北京科技大学 Law contract interactive labeling method based on contract text markup language
CN113177124A (en) * 2021-05-11 2021-07-27 北京邮电大学 Vertical domain knowledge graph construction method and system

Similar Documents

Publication Publication Date Title
US11574201B2 (en) Enhancing evolutionary optimization in uncertain environments by allocating evaluations via multi-armed bandit algorithms
US8959122B2 (en) Data processing device
CN110825882A (en) Knowledge graph-based information system management method
CN109165384A (en) A kind of name entity recognition method and device
CN100444591C (en) Method for acquiring front-page keyword and its application system
US20210357469A1 (en) Method for evaluating knowledge content, electronic device and storage medium
CN109783539A (en) Usage mining and its model building method, device and computer equipment
US8095575B1 (en) Word processor data organization
US20220269853A1 (en) Domain-specific language interpreter and interactive visual interface for rapid screening
CN114691831A (en) Task-type intelligent automobile fault question-answering system based on knowledge graph
Zhang et al. Effective subword segmentation for text comprehension
Chen et al. Crossdata: Leveraging text-data connections for authoring data documents
CN116468010A (en) Report generation method, device, terminal and storage medium
CN109710250A (en) It is a kind of for constructing the visualization engine system and method for user interface
CN111191429A (en) System and method for automatic filling of data table
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN109445794A (en) A kind of page building method and device
Streit et al. A spreadsheet approach to facilitate visualization of uncertainty in information
CN115309885A (en) Knowledge graph construction, retrieval and visualization method and system for scientific and technological service
US20240037084A1 (en) Method and apparatus for storing data
CN116304236A (en) User portrait generation method and device, electronic equipment and storage medium
CN113761044A (en) Labeling system method for labeling text into table
CN108766513B (en) Intelligent health medical data structured processing system
Zhu-Tian et al. CrossData: Leveraging Text-Data Connections for Authoring Data Documents
She et al. An automatic page code generation method based on excel template and poi technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination