CN116150613A - Information extraction model training method, information extraction method and device - Google Patents

Information extraction model training method, information extraction method and device Download PDF

Info

Publication number
CN116150613A
CN116150613A CN202210978578.9A CN202210978578A CN116150613A CN 116150613 A CN116150613 A CN 116150613A CN 202210978578 A CN202210978578 A CN 202210978578A CN 116150613 A CN116150613 A CN 116150613A
Authority
CN
China
Prior art keywords
structured
training
information extraction
model
prompter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210978578.9A
Other languages
Chinese (zh)
Inventor
丁隆耀
蒋宁
吴海英
李宽
吕乐宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN202210978578.9A priority Critical patent/CN116150613A/en
Publication of CN116150613A publication Critical patent/CN116150613A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

An information extraction model training method and device and an information extraction method and device are provided. The training method comprises the following steps: constructing a corresponding structured prompter template for each task in the multiple joint information extraction tasks to obtain multiple structured prompter templates; acquiring first training sample data and second training sample data; pre-training the reference model based on the first training sample data and the plurality of structured prompter templates to obtain a trained pre-training model; performing fine tuning training on the pre-training model based on the second training sample data and the plurality of structured prompter templates to obtain a trained information extraction model; the information extraction model is used for extracting structured languages respectively corresponding to the plurality of joint information extraction tasks from input text data based on prompt of the constructed structured prompter template.

Description

Information extraction model training method, information extraction method and device
Technical Field
The present invention relates to the field of natural language processing, and in particular, to an information extraction model training method, an information extraction method, and an information extraction device.
Background
Information extraction is generally required in financial institutions such as securities, banks and the like for important macroscopic financial news, mesoscopic industry news and microscopic enterprise news. Macroscopic economic news and indexes can be realized in a regular configuration mode because the focus of the mechanism is relatively fixed; news of middle-sized industries and micro enterprises generally needs an automatic information extraction mode to finish high-efficiency information extraction of related industries and enterprises under the conditions of numerous news and limited manpower. In general, business personnel are concerned about what the event of the news is specific to, and wish to be able to automatically give the news specific emotional direction (positive, negative, neutral) etc.
The problems of the above information extraction can be categorized into several categories: entity identification, relationship extraction, event extraction. In addition, in the financial scenario, the emotion analysis task (also referred to as a viewpoint extraction task) described above needs to be completed.
When various information extraction subtasks such as entity recognition, relationship extraction, event extraction, emotion analysis, etc. are performed in the prior art, various subtasks are typically performed in a pipeline form. In other words, each subtask is disassembled in a pipelining mode to do the respective task. However, the many models required for this pipeline model of the prior art require different training data, different models, and different training methods in doing each task, are very redundant, and are inefficient to perform. On the other hand, since the above-mentioned information extraction subtasks are directed to the same corpus, there is a certain correlation between the subtasks, but the pipeline mode of the prior art ignores the correlation between the information extraction subtasks, so it is difficult to well utilize such correlation in the subsequent processing.
Disclosure of Invention
The disclosure provides an information extraction model training method, an information extraction method and an information extraction device, which at least solve the problem of low execution efficiency of a plurality of related information extraction tasks in the related technology, and can also realize good utilization of correlation among the plurality of related information extraction tasks.
According to a first aspect of an embodiment of the present disclosure, there is provided an information extraction model training method, including: constructing a corresponding structured prompter template for each task in the multiple joint information extraction tasks to obtain multiple structured prompter templates; acquiring first training sample data and second training sample data; the first training sample data comprises first text data and a first structured language label corresponding to the first text data, and the second training sample data comprises second text data and a second structured language label corresponding to the second text data; pre-training the reference model based on the first training sample data and the plurality of structured prompter templates to obtain a trained pre-training model; performing fine tuning training on the pre-training model based on the second training sample data and the plurality of structured prompter templates to obtain a trained information extraction model; the information extraction model is used for extracting structured languages respectively corresponding to the plurality of joint information extraction tasks from input text data based on prompt of the constructed structured prompter template.
In the method, a unified structured language and a structured prompter template are adopted for a plurality of information extraction tasks related to the same corpus, so that the information extraction tasks can be executed by using the same training data, the same model and the same training method, and redundancy and inefficiency caused by multiple training by using a plurality of models in the prior art are reduced. In addition, the method adopts a unified structured prompter template for training, so that the correlation among various information extraction subtasks is well utilized.
According to a second aspect of the embodiments of the present disclosure, there is provided an information extraction method, including: acquiring text data to be processed, and respectively constructing a plurality of structured prompter templates aiming at a plurality of joint information extraction tasks; inputting the text data to be processed and the plurality of structured prompter templates into an information extraction model to obtain a plurality of structured languages respectively corresponding to the plurality of joint information extraction tasks; the information extraction model is used for extracting structured languages respectively corresponding to the plurality of joint information extraction tasks from input text data based on prompt of the constructed structured prompter template.
In the above information extraction method, since the same model and one prediction are used to extract the structured language corresponding to each of the plurality of joint information extraction tasks from the text data, the correlation between these information extraction tasks can be effectively utilized in the above information extraction method. Further, the structure of the text extracted by the above-described information extraction method is uniform, and thus an operation that a re-structuring is required due to the non-uniform structure of the model extraction result (i.e., text) can be omitted. On the other hand, due to the application of the structured language and the structured prompter template, the information extraction model can be adapted to various fields, and the universality of the model is improved.
According to a third aspect of the embodiments of the present disclosure, there is provided an information extraction model training apparatus, wherein the apparatus includes: the template construction unit is configured to construct a corresponding structured prompter template for each task in the plurality of joint information extraction tasks to obtain a plurality of structured prompter templates; a training data acquisition unit configured to acquire first training sample data and second training sample data; the first training sample data comprises first text data and a first structured language label corresponding to the first text data, and the second training sample data comprises second text data and a second structured language label corresponding to the second text data; a training unit configured to: pre-training the reference model based on the first training sample data and the plurality of structured prompter templates to obtain a trained pre-training model; performing fine tuning training on the pre-training model based on the second training sample data and the plurality of structured prompter templates to obtain a trained information extraction model; the information extraction model is used for extracting structured languages respectively corresponding to the plurality of joint information extraction tasks from input text data based on prompt of the constructed structured prompter template.
According to a fourth aspect of embodiments of the present disclosure, there is provided an information extraction apparatus, the apparatus including: the input unit is configured to acquire text data to be processed and a plurality of structured prompter templates respectively constructed for a plurality of joint information extraction tasks; the information extraction unit is configured to input the text data to be processed and the plurality of structured prompter templates into an information extraction model to obtain a plurality of structured languages respectively corresponding to the plurality of joint information extraction tasks, wherein the information extraction model is used for extracting the structured languages respectively corresponding to the plurality of joint information extraction tasks from the input text data based on the prompt of the constructed structured prompter templates.
According to a fifth aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the aforementioned information extraction model training method and/or information extraction method.
According to a sixth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by at least one processor, cause the at least one processor to perform the aforementioned information extraction model training method and/or information extraction method.
According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the aforementioned information extraction model training method and/or information extraction method.
It can be seen that by applying the information extraction model training method and apparatus and the information extraction method and apparatus according to the exemplary embodiments of the present disclosure, a plurality of joint information extraction tasks can be performed using the same training data, the same model, and the same training method, redundancy and inefficiency caused by performing multiple training using a plurality of models in the prior art are reduced, and since the structure of the text extracted by the information extraction method according to the present disclosure is uniform, an operation that the structure of the model extraction result (i.e., the text) is not uniform and a re-structuring is also required can be omitted. On the other hand, the application of the structured language and the structured prompter template can enable the information extraction model disclosed by the invention to be suitable for various fields, and the universality of the model is improved. In addition, the unified structured prompter template is adopted for training, so that the correlation among all information extraction subtasks is well utilized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.
Fig. 1 is a flowchart illustrating an information extraction model training method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a schematic diagram illustrating an information extraction task according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating an information extraction method according to an exemplary embodiment of the present disclosure.
Fig. 4 is a schematic diagram schematically illustrating information extraction according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram illustrating an information extraction model training apparatus according to an exemplary embodiment of the present disclosure.
Fig. 6 is a block diagram illustrating an information extraction apparatus according to an exemplary embodiment of the present disclosure.
Fig. 7 is an application scenario illustrating an information extraction model according to an exemplary embodiment of the present disclosure.
Fig. 8 is a diagram illustrating an implementation environment of an information extraction model training method and an information extraction method according to an exemplary embodiment of the present disclosure.
Fig. 9 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.
Hereinafter, the present application will be described in detail with reference to the drawings, wherein the same or similar elements will be designated with the same or similar reference numerals throughout the drawings.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of the embodiments of the disclosure defined by the claims and their equivalents. Various specific details are included to aid understanding, but are merely to be considered exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
The terms and words used in the following description and claims are not limited to written meanings, but are used only by the inventors to achieve a clear and consistent understanding of the disclosure. Accordingly, it should be apparent to those skilled in the art that the following descriptions of the various embodiments of the present disclosure are provided for illustration only and not for the purpose of limiting the disclosure as defined by the claims and their equivalents.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
Prior to the beginning of the description of the present disclosure, some technical contents and terms that may be used in the specification of the present disclosure will be explained so that the present disclosure can be more easily understood.
Named entity recognition (also known as entity recognition) is an important step in Natural Language Processing (NLP) applications, which detects not only entity boundaries, but also types of named entities, which is the basis for textual meaning understanding. In the information extraction scenario, entity recognition can recognize specific words such as time, place, person, organization and the like as the basis of the follow-up relation extraction task and the event extraction task. The entity recognition task is a basic task of the NLP.
Relationship extraction is mainly responsible for identifying entities from unstructured text and extracting semantic relationships between entities. For example: "[ A ] is the son of the A group board of Dong Long [ B ]. The sentence can identify that the entity [ A ] and the entity [ B ] have father-son relationship. The general relation extraction method is to identify the entities, and then complete classification by a classification model (for example, long-short-term memory (LSTM) model, convolutional Neural Network (CNN) model, transducer model, BERT (Bidirectional Encoder Representation from Transformers) model) to determine the relation between the entities.
Event extraction is the extraction of events of interest to a user from unstructured information and presentation to the user in a structured form. An event is a change in an event or state that occurs within a certain time period, a certain geographic area, and is made up of one or more actions that one or more characters are engaged in. Here, for event types, different actions or changes in state represent different types of events. In addition, for event elements, different times, places, elements in the same type of event represent different event instances. Event extraction is the extraction of event information from natural language text and presentation in structured form, and for the text "married at S by the auspices of austempering and milteff" the following can be extracted, by way of example only.
Figure RE-RE-GDA0003916993580000061
Event extraction requires the identification of several classes of entities, such as arguments, trigger words, roles, etc., as compared to relationship extraction. These classes of entities can be identified, typically by means of named entity recognition, and then combined and adapted by pattern matching or deep learning, to combine into the event structured data described above.
Emotion recognition is the determination of whether the text evaluates a transaction positively, negatively, or neutrally from the text, and is typically not an information extraction task, but rather a relatively independent subtask. The general method of emotion recognition adopts text classification, marks the text as three emotions of positive, neutral and negative, and directly adopts the text classification mode to solve the problem.
T5: t5, collectively called Text-to-Text Transfer Transformer, is a model architecture or paradigm for solving Natural Language Processing (NLP) tasks. All tasks (e.g. classification, similarity calculation, text generation) are put into a Text-to-Text framework for resolution. This has the advantage that all problems can be fit into a unified paradigm, so that the same model architecture, the same training strategy, the same loss function, the same decoding means can be used.
RoBERTa (a Robustly Optimized BERT Pretraining Approach): similar to the BERT in the prior art, mainly, several adjustments are made on the basis of the BERT: 1) Longer training time, larger batch size (batch size), more training data; 2) Removing a next prediction loss (next predict loss); 3) The training sequence is longer; 4) Dynamic adjustment Masking (Masking) mechanism. Roberta is widely used in NLP tasks because it works better than BERT in many scenarios.
Prompt Learning (Prompt Learning): a new paradigm of NLP training learning has recently been proposed. Unlike the pre-training model+Fine-tune training that is currently in common use, prompt learning does not adapt the pre-trained Language Model (LM) to downstream tasks through target engineering, but rather reforms the downstream tasks to make them look more like tasks solved during the original LM training with the help of text prompts (prompt). For example, in a typical information extraction task, the inputs and outputs are typically as follows:
input: i like this movie
And (3) outputting: "Positive" or "negative"
Whereas if learning is used to solve, the information extraction task may become "complete filling" as follows:
Input: i like this movie, which is a __ movie as a whole
And (3) outputting: "interesting" or "boring".
Information extraction (Information Extraction): the IE is short, namely, specific event or fact information is extracted from the natural language text, so that massive contents can be automatically classified, extracted and reconstructed. Such information typically includes entities (entities), relationships (events). The information extraction mainly comprises three subtasks: relationship extraction, named entity identification, event extraction.
When various information extraction subtasks such as entity recognition, relationship recognition, event extraction, emotion analysis, etc. are performed on language text such as news in the prior art, various subtasks for information extraction are typically performed on news separately in a pipeline form. In other words, each subtask is disassembled in a pipeline manner, and the subtasks are respectively done, and the entity identification task is generally executed first in front.
However, the numerous models required for this pipeline model of the prior art, requiring different training data, different models, and different training methods in doing each task, are very redundant. On the other hand, when the above-described various information extraction subtasks are executed in different fields, different definitions of the subtasks are required. For example, in the securities and medical industries, the types of events that need to be identified are different, and therefore require a lot of preprocessing work before performing a task. In addition, in the prior art, since the above-mentioned information extraction subtasks are directed to the same corpus, there is a certain correlation between the subtasks, but the pipeline mode of the prior art ignores the correlation between the information extraction subtasks, so that it is difficult to well utilize such correlation in subsequent processing.
Therefore, the present application provides an information extraction model training method and apparatus for the above problems in the prior art, in which a unified structured language and a structured prompter template are adopted for multiple information extraction tasks related to the same corpus, so that the information extraction tasks can be performed using the same training data, the same model and the same training method, redundancy and inefficiency caused by multiple training using multiple models in the prior art are reduced, and since the structure of the text extracted according to the scheme of the present disclosure is unified, an operation that the structure of the model extraction result (i.e., the text) is not uniform and a re-structuring is required can be omitted. On the other hand, the application of the structured language and the structured prompter template can enable the information extraction model disclosed by the invention to be suitable for various fields, and the universality of the model is improved. In addition, the unified structured prompter template is adopted for training, so that the correlation among all information extraction subtasks is well utilized.
The training method and apparatus of the information extraction model of the present disclosure will be described in more detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating an information extraction model training method according to an exemplary embodiment of the present disclosure.
Referring to fig. 1, in step S101, a corresponding structured prompter template may be first constructed for each of a plurality of joint information extraction tasks, thereby obtaining a plurality of structured prompter templates. Information extraction tasks, structured languages, and structured prompter templates, etc., according to exemplary embodiments of the present disclosure, will be described in detail below.
Fig. 2 is a schematic diagram illustrating an information extraction task according to an exemplary embodiment of the present disclosure.
As shown in fig. 2, the plurality of joint information extraction tasks according to the exemplary embodiment of the present disclosure may include at least one of the following tasks: entity identification tasks, relationship extraction tasks, event extraction tasks, and perspective extraction tasks. However, it should be understood that the above information extraction tasks are merely examples listed for ease of understanding, and that the exemplary embodiments of the present disclosure are not limited thereto, but that some other tasks related to information extraction may also be added as desired.
As shown in fig. 2, the multiple joint information extraction tasks may perform information extraction with respect to the same corpus, so as to obtain a corresponding information extraction result. For example, the entity recognition task may extract entities related to "people", "organizations", etc. from the input text data, the relationship extraction task may extract relationships such as "work on a company" from the same input text data, the event extraction task may extract events related to "storm Lei Shijian", "high management change", etc. from the same input text data, and the perspective extraction task may extract perspectives such as "optimistic", "neutral", "pessimistic", etc. from the same input text data.
The extraction result of the information extraction task according to the exemplary embodiment of the present disclosure may be a structured information extraction language (which will be simply referred to as a structured language hereinafter). Here, the structured language means that contents extracted from text data have a predetermined structure or format, which will be explained in detail below.
In general, the information extraction task may consist of two atomic operations:
1) Positioning operation: for locating target information blocks (e.g. "entities" or "trigger words" in event extraction) from text (e.g. sentences)
2) And (3) association operation: for connecting different pieces of information (e.g. relationships between entities or roles between events and their arguments) according to a desired association
Thus, the language extracted by the information extraction task may be expressed in the following format:
(
("positioning operation" acquired information: entity/trigger words, etc.)
( Information obtained by the "association operation": relationship/event/perspective type 1 )
( Information obtained by the "association operation": relationship/event/perspective type 2 )
)
)
By way of example only, for the text data "1997, shi Jia, which is open to act as CEO for company a", the following structured language may be extracted:
(
(character: shi Jia)
(working in company A)
)
(character: shi Jia)
(viewpoint: happy)
)
(trigger event: role
(employed party: shi Jia)
(employed party: company A)
(time: 1997)
)
(organization/company: company A)
(time: 1997)
)
(person Shi Jia (working in a company) is a structured language obtained by relation extraction task, "person" is entity type, and its concrete content is information obtained by "positioning operation". The "work on" is a relationship type, and its concrete content is information obtained by "association operation".
(persona: shi Jia (opinion: happy)) is structured language obtained by extracting tasks from the opinion, "persona" is an entity type, the specific content of which is information obtained by "positioning operation". The "viewpoint" is a relationship type, and its specific content is information obtained by "association operation".
(triggering event: role (hire: shi Jia) (hire: A company)
(time: 1997)) for the event extraction task, "trigger event" is an event type, whose specific content is information obtained by "positioning operation". The "employer", "employment party" and "event" are all relationship types, and their specific content is information obtained by "association operation".
(organization/company: A company) (time: 1997) is structured language obtained for entity recognition task, "organization/company" and "time" are entity types, and specific content thereof is information obtained by "positioning operation".
To obtain such structured language, a structured prompter template may be constructed and used such that an information extraction task according to an exemplary embodiment of the present disclosure is able to extract structured language from text data based on the cues of the corresponding structured prompter template. The construction of the structured prompter template will be described in detail below.
According to an exemplary embodiment of the present disclosure, the structured prompter template may be composed of at least one of a positioning operation prompt and an association operation prompt, the positioning operation prompt may be used to obtain a target information block in the text data, and the association operation prompt may be used to obtain an association information block in the text data that connects different information blocks according to a desired association. That is, an information block acquired from text data based on a positioning operation hint may be referred to as a target information block, an information block acquired from text data based on an association operation hint may be referred to as an association information block, and the association operation hint may indicate an association between a corresponding acquired association information block and an association information block acquired according to other association operation hints in a corresponding structured hinter template, or an association between a corresponding acquired association information block and a target information block acquired according to a positioning operation hint in a corresponding structured hinter template. For example, as shown in the previous example, the association operation hint "view" may indicate an association between "happy" and "target information block" historia "corresponding to the obtained association information block" working "may indicate an association between" company a "and" target information block "historia" corresponding to the obtained association information block.
In an exemplary embodiment of the present invention, the location operation prompt may include at least one location operation prompt content, each of which is composed of location operation information (hereinafter referred to as [ DW ]) and location operation prompt text. Further, the associated operation prompt includes at least one associated operation prompt content, each of which (hereinafter referred to as [ GL ]) is constituted of associated operation information and associated operation prompt text.
Therefore, based on the above-described constitution of the structured prompter template, the structured prompter template of each joint information extraction task may be constructed as follows: determining the content to be identified by the joint information extraction task; classifying the determined content to be identified to obtain a positioning operation prompt text corresponding to the positioning operation prompt and an associated operation prompt text corresponding to the associated operation prompt; and distributing positioning operation information and associated operation information to the positioning operation prompt text and the associated operation prompt text respectively to form a structured prompter template corresponding to the joint information extraction task. Here, in constructing the structured prompter template, respective prompt contents (i.e., positioning operation prompt contents and associated operation prompt contents) of the prompter template may be spliced with reference to a human language order.
By way of example only, taking an extraction model for microscopic events of importance in the securities industry as an example, the prompter template may be defined in the form of table 1 below:
[ Table 1 ]
Figure RE-RE-GDA0003916993580000111
In table 1, by way of example only, "[ DW ] place [ DW ] company [ DW ] person [ GL ] is located in a" prompter template that works for a relationship extraction task "that is composed of a positioning operation prompt" [ DW ] place [ DW ] person "and an associated operation prompt" [ GL ] is located in a "structure", "[ DW ] place", "[ DW ] company", "[ DW ] person" as positioning operation prompt content (where [ DW ] is positioning operation information, "place", "company", "person" are positioning operation prompt texts). Similarly, "[ GL ] is located" and "[ GL ] is operated" as associated operation prompt content (where [ GL ] is associated operation information, "located" and "[ GL ] is operated" as associated operation prompt text).
However, it should be understood that the prompter templates shown in table 1 above are merely examples, and the present application is not limited thereto, and in fact more entities, relationships, and events may be identified in a scenario, for example, an event may typically take into account about 3-10 events.
After the structured prompter templates for the respective joint information extraction tasks are constructed as above, first training sample data and second training sample data may be acquired in step S102. Here, the first training sample data may include first text data (e.g., original text, sentence, etc.) and a first structured language tag corresponding to the first text data, and the second training sample data may include second text data (e.g., original text, sentence, etc.) and a second structured language tag corresponding to the second text data. Here, the structured language tag corresponding to the text data may be a structured language having contents corresponding to prompts of the corresponding structured prompter template. For example, for text data "CEO of A company is the dominant role of Shi Jia", and assuming that the structured prompter template set for the relation extraction task is "[ DW ] person [ GL ] working on", its corresponding structured language label may be labeled "(person: shi Jia (working on A company))".
Further, in exemplary embodiments of the present invention, the first training sample data and the second training sample data may be the same or different. For example, the first training sample data may be training sample data for a wide variety of natural language processing domains, while the second training sample data may be training sample data for a particular (e.g., desired) domain. Further, in case the amount of the domain data is sufficient, the first training sample data and the second training sample data may be the same training sample data for the same domain or different training sample data for the same domain.
In an exemplary embodiment of the present invention, the structured language label corresponding to the text data may be set by means of manual labeling or the like based on each structured prompter template constructed. In addition, in the training sample data, different structured language labels may be set (e.g., labeled) for different information extraction tasks for the same corpus. For example, for text data "CEO of Shi Jia, which is a popular company A", structured language tags may be labeled for relation extraction tasks "(persona: shi Jia)
(working on company a)) "structured language tags" are labeled for entity identification tasks (organization/company: company a) (time: 1997) ", structured language tags were labeled for event extraction tasks (trigger events): role (employer: shi Jia) (employer: a company) (time: 1997)) ") and annotate structured language tags for point-of-view extraction tasks (persona: shi Jia (point of view: happy)) ".
According to an exemplary embodiment of the present invention, the first training sample data and the second training sample data may each include a positive sample including text data and a plurality of correctly structured language tags corresponding to the text data, the plurality of correctly structured language tags being respectively set for a plurality of joint information extraction tasks, the correctly structured language tags being structured languages having correct contents according to prompts of corresponding structured prompter templates, for example, "(characters: shi Jia (working: a company))".
In addition, the first training sample data and the second training sample data may each further include a negative sample generated based on the positive sample, and the negative sample may include text data and an error structured language tag corresponding to the text data, the error structured language tag being a structured language generated by randomly generating errors to the content of the correct structured language tag of the positive sample. For example, a negative sample may be generated by randomly generating some combination of errors on the positive sample, e.g., the correct combination is (person: shi Jia (working at: a)), the combination of errors may be: (company: shi Jia (located: company a)).
With continued reference to fig. 1, at step S103, the reference model may be pre-trained based on the first training sample data and the plurality of structured prompter templates to obtain a trained pre-trained model. In an exemplary embodiment of the present invention, the reference model may include an encoder and a decoder.
According to an exemplary embodiment of the present disclosure, the reference model may be a generative model, by way of example only. For example, the reference model may employ a T5-base model that has been pre-trained, and in an exemplary embodiment of the present disclosure, it may be pre-trained secondarily at step S103. The secondary pre-training process will be described in detail below.
The pre-training of step S103 may comprise three different pre-training tasks:
1) Text data to structured language pre-training
The text data to structured language pre-training task is used for carrying out structured information extraction processing on the input first text data based on the prompt of the input structured prompter template through the reference model to obtain a first structured language.
More specifically, in the text data to structured language pre-training task, the input data for the reference model is structured pattern prompter+text data, and the structured language generated for the reference model is output.
Here, the format of the input data (i.e., structured schema prompter+text data) may be set as follows:
entity extraction tasks: locating operation information, entity category and text data
Relation extraction task: positioning operation information, entity category, associated operation information, relation category and text data
Event extraction task: positioning operation information, event category, associated operation information, argument category and text data
View extraction task: positioning operation information, evaluation dimension, associated operation information, viewpoint category and text data
Here, the entity category, the relationship category, the event category, the meta category, the evaluation dimension, and the viewpoint category may be a presentation text for the respective corresponding positioning operation information ([ DW ]) and the associated operation information ([ GL ]). By way of example only, the input data may be "[ DW ] person [ GL ] working in +TEXT1997, shi Jia being open to CEO of company A).
Text data to structured language pre-training is the basis for the entire pre-training phase, aiming at enabling the reference model to have the ability to generate structured language under correct input and also enabling the reference model to identify erroneous structural information combinations. Thus, in the training of this step, the reference model may be trained using a negative sample in addition to the positive sample. The text data of the positive and negative samples may be the same, but the structured language labels may be different, and the method of obtaining the negative sample has been described above and will not be described in further detail herein for the sake of brevity.
In an exemplary embodiment of the present invention, a dataset used for text data to structured language pre-training may be defined as D 1 X is input, y is output, θ e And theta d Encoder parameters and decoder parameters, θ, respectively q For other model parameters, the corresponding first loss function may be denoted as L 1
Figure RE-RE-GDA0003916993580000141
Here, θ e 、θ d And theta q The set of corresponding parameters may be indicated and is not limited to indicating only a single parameter.
As above, due to the first loss function L 1 Encoder parameters θ with reference model e Decoder parameter θ and model parameter θ q Correlation, therefore, except for the model parameter θ q In addition to text data to structured language pre-training, encoder parameters θ that can be used to optimize the reference model e And decoder parameter θ d . Herein, the model parameter θ q Divide encoder parameter θ indicative of reference model e And decoder parameter θ d Other parameters.
2) Structured language generation pre-training
According to an exemplary embodiment of the present disclosure, a structured language generation language training takes a first structured language tag with a part of the structured language deleted as an input, and performs a completion process on the first structured language tag with the part of the structured language deleted through a reference model to obtain a completed structured language.
Specifically, this step of pre-training focuses mainly on the ability of the reference model to generate a structured language, where the input of the reference model uses only the structured language, where a portion of the structured language (e.g., the front portion of the structured language) is input, where the remaining portion is generated, where the complete structured language is obtained (e.g., input as "person Shi Jia", output as "(person Shi Jia (working at company a))"), and where the step of pre-training may only train the decoder portion of the reference model, where it learns the syntax of the structured language.
For example only, the dataset used by the structured language generation pre-training may be defined as D2, y <i For input, indicating a front portion of the structured language in the dataset, y for output, indicating the complete structured language generated, the corresponding second loss function may be denoted as L 2
Figure RE-RE-GDA0003916993580000151
As above, due to the second loss function L 2 Decoder parameters θ with reference model d And model parameter θ q Correlation, therefore, except for the model parameter θ q In addition, structured language generation pre-training may optimize decoder parameters θ of the reference model d
3) Semantic coding pre-training
The semantic coding pre-training is used for taking the first text data with the part data erased as input, and performing text prediction processing on the first text data with the part data erased through a reference model to obtain the predicted part data.
By way of example only, to provide the reference model with basic semantic coding capabilities, an unsupervised mask language model (masked language model) training task may be performed in this step of pre-training, constructing unstructured raw text data: (None, text', text "). 15% of the tokens can be erased (MASK) in the original sentence, and then erased portions are generated, and the erased text is output by the portions already presented in the input. Here, the data set used for semantic coding pre-training may be defined as D 3 (i.e., the original text data), the corresponding third loss function thereof can be denoted as L 3
Figure RE-RE-GDA0003916993580000152
Where x is the original text data, x' is the text that is MASK-dropped, and x "is the region that is MASK-dropped.
As above, due to the third loss function L 3 Encoder parameters θ with reference model e Decoder parameter θ and model parameter θ q Correlation, therefore, except for the model parameter θ q In addition, semantic coding pre-training may also optimize encoder parameters θ of the reference model e And decoder parameter θ d
Finally, the total loss function L of the reference model all The sum of the loss functions of the three pre-training tasks may be as follows:
L all =L 1 +L 2 +L 3
however, it should be understood that the relationship between the above-described total loss function and the loss functions of the above-described three pre-training tasks is only an example, and the present application is not limited thereto, and the total loss function may be constructed based on the first loss function, the second loss function, and the third loss function in other manners.
In an embodiment of the present invention, when the pre-training includes the above three pre-training tasks, training sample data for the above three pre-training tasks may be respectively constructed based on the first training sample data as above.
For example only, the first text data in the first training sample data may be used as input data for the text data into the training sample data of the structured language pre-training, and the tags in the training sample data may be first structured language tags corresponding to the first text data.
Further, by way of example only, a portion of the first structured language tag in the first training sample data (i.e., the first structured language tag with a portion of the structured language deleted) may be truncated as the structured language to generate input data in the pre-trained training sample data, and the tag in the training sample data may be the first structured language tag.
Further, by way of example only, the first text data erased with the partial data obtained after erasing the partial data in the first text data in the first training sample data may be used as input data in the training sample data for the semantic encoding pre-training, and the tag in the training sample data may be the first text data.
In an exemplary embodiment of the present invention, the reference model may be pre-trained based on the first training sample data and the plurality of structured prompter templates as follows: performing text data to structured language pre-training and determining a first penalty function based on the first structured language and the first structured language tag; executing the structured language to generate a pre-training and determining a second penalty function based on the completed structured language and the first structured language tag; performing semantic encoding pre-training and determining a third loss function based on the erased portion data and the predicted portion data; and adjusting the reference model based on the first loss function, the second loss function and the third loss function to obtain the pre-training model.
That is, training sample data constructed for text data to structured language pre-training tasks may first be input to the benchmark model, resulting in a first penalty function. And then inputting training sample data constructed by generating a pre-training task aiming at the structured language into a reference model to obtain a second loss function. And then inputting training sample data constructed for the semantic coding pre-training task into the reference model to obtain a third loss function. Thereafter, the reference model may be adjusted based on the first, second, and third loss functions together (e.g., based on a weighted sum of the first, second, and third loss functions) to obtain a trained pre-training model.
Although the pre-training process is described in the above description in terms of an order of text data to structured language pre-training, structured language generation pre-training, and semantic coding pre-training, the order of execution of the three pre-training tasks is not limited thereto and may be performed in other orders as desired according to the exemplary embodiments of the present disclosure.
After the pre-training is completed as above, in step S104, the pre-training model may be subjected to fine-tuning training (e.g., fine-tune training) based on the second training sample data and the plurality of structured prompter templates constructed, to obtain a trained information extraction model. An information extraction model according to an exemplary embodiment of the present disclosure may extract structured languages respectively corresponding to a plurality of joint information extraction tasks from input text data based on hints of a structured hinter template constructed. In other words, the information extraction model according to the exemplary embodiments of the present disclosure is a model capable of generating an accurate structured language from input text data and a structured prompter template.
In an exemplary embodiment of the present invention, in step S, a second text data in the second training sample data may be subjected to a structured information extraction process based on the cues of the plurality of structured prompter templates through the pre-training model to obtain a second structured language, a fine tuning loss function is determined based on the second structured language and the second structured language label, and model parameters of the pre-training model are adjusted based on the fine tuning loss function to obtain a trained information extraction model
According to an exemplary embodiment of the present disclosure, the training method of fine training is substantially the same as the training method of text data to structured language pre-training of the pre-training stage, but in fine training, the encoder and decoder of the pre-training model are not trained. That is, the fine-tuning loss function used in the fine-tuning training does not relate to the encoder parameters and decoder parameters of the pre-training model.
Further, it should be understood that the above execution sequence of steps S101 to S104 is merely an example, and the present application is not limited thereto, and for example, after the structured prompter template is constructed, the first training data may be acquired first, the pre-training process described above is performed based on the first training sample data, then the second training data is acquired, and the fine-tuning training process described above is performed based on the second training sample data.
The information extraction model trained in the above manner and the structured prompter template constructed can be used to obtain structured language from the original text in the prediction stage. A method of information extraction using an information extraction model according to an exemplary embodiment of the present disclosure will be described below in conjunction with fig. 3.
Fig. 3 is a flowchart illustrating an information extraction method according to an exemplary embodiment of the present disclosure.
Referring to fig. 3, in step S301, text data to be processed may be acquired, and a plurality of structured prompter templates respectively constructed for a plurality of joint information extraction tasks. Here, the plurality of joint information extraction tasks may include at least one of the following tasks: entity identification tasks, relationship extraction tasks, event extraction tasks, and perspective extraction tasks.
According to an exemplary embodiment of the present disclosure, the plurality of structured prompter templates acquired in step S301 may be structured prompter templates constructed in the training phase as described above, and may also include structured prompter templates constructed additionally. That is, the structured prompter template may be composed of at least one of a positioning operation prompt that may be used to obtain a target information block in the text data and an association operation prompt that may be used to obtain an associated information block in the text data that connects different information blocks according to a desired association. In addition, the location operation prompt may include at least one location operation prompt content, each location operation prompt content being composed of location operation information and location operation prompt text, and the associated operation prompt may include at least one associated operation prompt content, each associated operation prompt content being composed of associated operation information and associated operation prompt text. These have been described in detail above and will therefore not be described in detail here for the sake of brevity.
Then, in step S302, the text data to be processed and the plurality of structured prompter templates may be input into the information extraction model, so as to obtain a plurality of structured languages respectively corresponding to the plurality of joint information extraction tasks. In an exemplary embodiment of the present invention, the information extraction model used in step S302 may extract structured languages corresponding to the plurality of joint information extraction tasks, respectively, from the input text data based on the hints of the input structured hinter template. The information extraction model used in this step is a model obtained by training with reference to the training method of fig. 1, and will not be described in more detail here.
Further, by way of example only, the input format of the information extraction model may be as follows:
entity extraction tasks: locating operation information, entity category and text data
Relation extraction task: positioning operation information, entity category, associated operation information, relation category and text data
Event extraction task: positioning operation information, event category, associated operation information, argument category and text data
View extraction task: positioning operation information, evaluation dimension, associated operation information, viewpoint category and text data
The output of the information extraction model is the structured language.
Fig. 4 is a schematic diagram schematically illustrating information extraction according to an exemplary embodiment of the present disclosure.
As shown in FIG. 4, the left side of the model is the input structured prompter template+text data, and the right side of the model is the structured language generated by the model.
In this way, each information extraction task can obtain extracted text with uniform structure, so that the text does not need to be re-structured after the information extraction task is completed due to the non-uniform structure of the extracted text.
Fig. 5 is a block diagram illustrating an information extraction model training apparatus according to an exemplary embodiment of the present disclosure.
Referring to fig. 5, an information extraction model training apparatus 500 according to an exemplary embodiment of the present disclosure includes: template construction unit 510, training data acquisition unit 520, and training unit 530
In an exemplary embodiment of the present disclosure, the template construction unit 510 may be configured to construct a respective structured prompter template for each of a plurality of joint information extraction tasks, obtaining a plurality of structured prompter templates.
According to an exemplary embodiment of the present disclosure, the plurality of joint information extraction tasks may include at least one of the following tasks: entity identification tasks, relationship extraction tasks, event extraction tasks, and perspective extraction tasks.
According to an exemplary embodiment of the present disclosure, the structured prompter template may include a positioning operation prompt, which is a prompt for acquiring a target information block in the text data, and/or an association operation prompt, which is a prompt for acquiring an association information block in the text data, which is used to indicate an association between different information blocks.
According to an exemplary embodiment of the present disclosure, the positioning operation prompt may include at least one positioning operation prompt content, each consisting of positioning operation information and positioning operation prompt text. The associated operation prompt may include at least one associated operation prompt content, each of the associated operation prompt content being composed of associated operation information and associated operation prompt text.
According to an exemplary embodiment of the present disclosure, the template construction unit 510 may construct a corresponding structured prompter template for each joint information extraction task by: determining the content to be identified by the joint information extraction task; classifying the determined content to be identified to obtain a positioning operation prompt text corresponding to the positioning operation prompt and an associated operation prompt text corresponding to the associated operation prompt; and distributing positioning operation information and associated operation information to the positioning operation prompt text and the associated operation prompt text respectively to form a structured prompter template corresponding to the joint information extraction task.
In an exemplary embodiment of the present disclosure, the training data obtaining unit 520 may be configured to obtain first training sample data and second training sample data, wherein the first training sample data may include first text data and a first structured language tag corresponding to the first text data, and the second training sample data may include second text data and a second structured language tag corresponding to the second text data.
According to an exemplary embodiment of the present disclosure, the first training sample data and the second training sample data may each include a positive sample, and the positive sample may include text data and a plurality of correctly structured language tags corresponding to the text data, the plurality of correctly structured language tags being respectively set for the plurality of joint information extraction tasks, the correctly structured language tags being structured languages having correct contents according to prompts of corresponding structured prompter templates.
According to an exemplary embodiment of the present disclosure, the first training sample data and the second training sample data may each further include a negative sample generated based on the positive sample, wherein the negative sample may include text data and an error structured language tag corresponding to the text data, the error structured language tag being a structured language generated by randomly generating errors to contents of the correct structured language tag of the positive sample.
In an exemplary embodiment of the present disclosure, the training unit 530 may be configured to pretrain the reference model based on the first training data and the plurality of structured prompter templates to obtain a trained pretrained model, and fine-tune the pretrained model based on the second training data and the plurality of structured prompter templates to obtain a trained information extraction model. An information extraction model according to an exemplary embodiment of the present disclosure may extract structured languages respectively corresponding to a plurality of joint information extraction tasks from input text data based on hints of a structured hinter template constructed.
According to an example embodiment of the disclosure, the reference model may be a generative model.
According to an example embodiment of the present disclosure, the pre-training may include: text data to structured language pre-training, which is used for carrying out structured information extraction processing on the input first text data based on the prompt of the input structured prompter template through the reference model to obtain a first structured language; the method comprises the steps of generating pre-training of a structured language, wherein the pre-training is used for taking a first structured language label with a part of the structured language deleted as input, and carrying out complement processing on the first structured language label with the part of the structured language deleted through the reference model to obtain a complement structured language; and semantic coding pre-training, which is used for taking the first text data with the part data erased as input, and executing text prediction processing on the first text data with the part data erased through the reference model to obtain the predicted part data.
According to an exemplary embodiment of the present disclosure, the semantic coding pre-training may use an unsupervised mask language model training method.
According to an example embodiment of the present disclosure, the reference model may include an encoder and a decoder. The step of pre-training the reference model based on the first training sample data and the plurality of structured prompter templates comprises: wherein pre-training the reference model based on the first training sample data and the plurality of structured prompter templates comprises: performing the text data to structured language pre-training and determining a first loss function based on the first structured language and the first structured language tag, wherein the first loss function is related to encoder parameters, decoder parameters, and model parameters of the reference model, performing the structured language generation pre-training, and determining a second loss function based on the completed structured language and the first structured language tag, wherein the second loss function is related to decoder parameters and model parameters of the reference model; performing the semantic encoding pre-training and determining a third loss function based on the erased portion data and the predicted portion data, wherein the third loss function is related to encoder parameters, decoder parameters, and model parameters of the reference model; and adjusting the reference model based on the first loss function, the second loss function and the third loss function to obtain the pre-training model.
According to an exemplary embodiment of the present disclosure, the fine tuning training may include: and carrying out structured information extraction processing on second text data in the second training sample data based on prompts of the structured prompter templates through the pre-training model to obtain a second structured language, determining a fine tuning loss function based on the second structured language and the second structured language label, and adjusting model parameters of the pre-training model based on the fine tuning loss function to obtain the information extraction model.
The operations performed by the above respective units have been described in detail above with reference to fig. 1, and thus, for brevity, will not be explained here.
Fig. 6 is a block diagram illustrating an information extraction apparatus according to an exemplary embodiment of the present disclosure.
Referring to fig. 6, an information extraction apparatus 600 according to an exemplary embodiment of the present disclosure may include an input unit 610 and an information extraction unit 620.
In an exemplary embodiment of the present invention, the input unit 610 may be configured to acquire text data to be processed, and a plurality of structured prompter templates respectively constructed for a plurality of joint information extraction tasks.
Further, in an exemplary embodiment of the present invention, the information extraction unit 620 may be configured to input the text data to be processed and the plurality of structured prompter templates into the information extraction model, obtaining a plurality of structured languages corresponding to the plurality of joint information extraction tasks, respectively. According to an exemplary embodiment of the present invention, the information extraction model may extract structured languages corresponding to a plurality of joint information extraction tasks, respectively, from the input text data based on hints of the structured hinter template constructed
According to an exemplary embodiment of the present invention, the information extraction model may be a model obtained by training the foregoing information extraction model training method.
The operations performed by the above respective units have been described in detail above with reference to fig. 3, and thus, for brevity, will not be explained here.
Fig. 7 is an application scenario illustrating an information extraction model according to an exemplary embodiment of the present disclosure.
As shown in fig. 7, the information extraction model according to the exemplary embodiment of the present disclosure may be applied to, for example, a news scene. In this scenario, important consulting web site news may first be obtained by means such as a crawler (701), and the obtained news filtered and preprocessed (702), which is then input into the information extraction model of the present disclosure (703), obtaining a structured language (704). These structured languages may be written into the corresponding databases according to rules (705) and then waiting for service invocation (706).
It should be appreciated that the above application scenario is only an example, and the present disclosure is also applicable to any other scenario with similar requirements.
Fig. 8 is a diagram illustrating an implementation environment of an information extraction model training method and an information extraction method according to an exemplary embodiment of the present disclosure.
As shown in fig. 8, an implementation environment of the information extraction model training method according to an exemplary embodiment of the present disclosure may include a terminal device 810, or include the terminal device 810 and a server device 820 connected to the terminal device through a wired or wireless network or the like. The terminal device 810 may be an electronic device capable of applying an information extraction function, such as a smart phone, a notebook computer, a desktop computer, a smart watch, or the like. Server device 820 may be a platform device consisting of one or more servers, or a virtual server platform.
The terminal device 810 can independently implement the foregoing information extraction model training method, and then apply the trained information extraction model to perform information extraction.
Alternatively, the terminal device 810 may implement the aforementioned information extraction model training method together with the server device 820. For example, the terminal device 810 may obtain training sample data for training a model from the server device 820, and a model training process may be implemented on the terminal device 810. For another example, the terminal device 810 may load the information extraction model trained in the server device 820 directly from the server device 820 or the information extraction model trained acquired by the server device 820 from another device and then perform the information extraction task using the information extraction model.
Fig. 9 is a block diagram illustrating an electronic device 900 according to an exemplary embodiment of the present disclosure.
According to embodiments of the present disclosure, an electronic device may be provided. Fig. 9 is a block diagram of an electronic device 900 including at least one memory 901 and at least one processor 902 having stored therein a set of computer-executable instructions that, when executed by the at least one processor, perform an information extraction model training method and/or an information extraction method according to an embodiment of the present disclosure.
By way of example, electronic device 900 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the above-described set of instructions. Here, the electronic device 900 is not necessarily a single electronic device, but may be any apparatus or a collection of circuits capable of executing the above-described instructions (or instruction set) individually or in combination. The electronic device 900 may also be part of an integrated control system or system manager, or may be a portable electronic device configured to interface with locally or remotely (e.g., via wireless transmission).
In electronic device 900, processor 902 may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, the processor 902 may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, and the like.
The processor 902 may execute instructions or code stored in the memory, wherein the memory 901 may also store data. The instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory 901 may be integrated with the processor 902, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. In addition, memory 901 may include a stand-alone device, such as an external disk drive, storage array, or other storage device usable by any database system. The memory 901 and the processor 902 may be operatively coupled or may communicate with each other, for example, through an I/O port, network connection, etc., such that the processor 902 is able to read files stored in the memory 901.
In addition, the electronic device 900 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.
According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium, wherein the instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the emotion prediction method and/or the training method for the model of emotion prediction of the embodiments of the present disclosure. Examples of the computer readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, blu-ray or optical disk storage, hard Disk Drives (HDD), solid State Disks (SSD), card memory (such as multimedia cards, secure Digital (SD) cards or ultra-fast digital (XD) cards), magnetic tape, floppy disks, magneto-optical data storage, hard disks, solid state disks, and any other means configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data structures to a processor or computer to enable the processor or computer to execute the programs. The computer programs in the computer readable storage media described above can be run in an environment deployed in a computer device, such as a client, host, proxy device, server, etc., and further, in one example, the computer programs and any associated data, data files, and data structures are distributed across networked computer systems such that the computer programs and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
According to an embodiment of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement the information extraction training method and/or the information extraction method of the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (14)

1. An information extraction model training method, characterized in that the method comprises the following steps:
constructing a corresponding structured prompter template for each task in the multiple joint information extraction tasks to obtain multiple structured prompter templates;
Acquiring first training sample data and second training sample data; the first training sample data comprises first text data and a first structured language label corresponding to the first text data, and the second training sample data comprises second text data and a second structured language label corresponding to the second text data;
pre-training the reference model based on the first training sample data and the plurality of structured prompter templates to obtain a trained pre-training model;
performing fine tuning training on the pre-training model based on the second training sample data and the plurality of structured prompter templates to obtain a trained information extraction model;
the information extraction model is used for extracting structured languages respectively corresponding to the plurality of joint information extraction tasks from input text data based on prompt of the constructed structured prompter template.
2. The information extraction model training method of claim 1, wherein the structured prompter template comprises a positioning operation prompt and/or an associated operation prompt;
the positioning operation prompt is a prompt for acquiring a target information block in text data;
The association operation prompt is a prompt for acquiring an association information block in text data, wherein the association information block is used for indicating association between different information blocks.
3. The information extraction model training method as claimed in claim 2, wherein:
the positioning operation prompt comprises at least one positioning operation prompt content, and each positioning operation prompt content consists of positioning operation information and positioning operation prompt text;
the associated operation prompt comprises at least one associated operation prompt content, and each associated operation prompt content is composed of associated operation information and associated operation prompt text.
4. The information extraction model training method as claimed in claim 1, wherein the specific implementation manner of constructing the corresponding structured prompter template for each joint information extraction task is as follows:
determining the content to be identified by the joint information extraction task;
classifying the determined content to be identified to obtain a positioning operation prompt text corresponding to the positioning operation prompt and an associated operation prompt text corresponding to the associated operation prompt;
and distributing positioning operation information and associated operation information to the positioning operation prompt text and the associated operation prompt text respectively to form a structured prompter template corresponding to the joint information extraction task.
5. The information extraction model training method of claim 1, wherein the first training sample data and the second training sample data each comprise positive samples;
the positive sample comprises text data and a plurality of correct structured language labels corresponding to the text data, the correct structured language labels are respectively set for the plurality of joint information extraction tasks, and the correct structured language labels are structured languages with correct contents according to prompts of corresponding structured prompter templates.
6. The information extraction model training method of claim 5, wherein the first training sample data and the second training sample data each further comprise negative samples generated based on the positive samples;
the negative sample comprises text data and an error structured language label corresponding to the text data, wherein the error structured language label is a structured language generated by generating random errors on the content of a correct structured language label of the positive sample.
7. The information extraction model training method of claim 1, wherein the pre-training comprises:
Text data to structured language pre-training, which is used for carrying out structured information extraction processing on the input first text data based on the prompt of the input structured prompter template through the reference model to obtain a first structured language;
the method comprises the steps of generating pre-training of a structured language, wherein the pre-training is used for taking a first structured language label with a part of the structured language deleted as input, and carrying out complement processing on the first structured language label with the part of the structured language deleted through the reference model to obtain a complement structured language; and
the semantic coding pre-training is used for taking the first text data with the part data erased as input, and performing text prediction processing on the first text data with the part data erased through the reference model to obtain the predicted part data.
8. The information extraction model training method of claim 7, wherein the reference model comprises an encoder and a decoder,
wherein pre-training the reference model based on the first training sample data and the plurality of structured prompter templates comprises:
performing the text data to structured language pre-training and determining a first loss function based on the first structured language and the first structured language tag, wherein the first loss function is related to encoder parameters, decoder parameters and model parameters of the reference model,
Performing the structured language generation pre-training and determining a second loss function based on the completed structured language and the first structured language tag, wherein the second loss function is related to decoder parameters and model parameters of the reference model;
performing the semantic encoding pre-training and determining a third loss function based on the erased portion data and the predicted portion data, wherein the third loss function is related to encoder parameters, decoder parameters, and model parameters of the reference model;
and adjusting the reference model based on the first loss function, the second loss function and the third loss function to obtain the pre-training model.
9. The information extraction model training method of claim 6, wherein the fine tuning training comprises:
and carrying out structured information extraction processing on second text data in the second training sample data based on prompts of the structured prompter templates through the pre-training model to obtain a second structured language, determining a fine tuning loss function based on the second structured language and the second structured language label, and adjusting model parameters of the pre-training model based on the fine tuning loss function to obtain the information extraction model.
10. An information extraction method, characterized in that the information extraction method comprises:
acquiring text data to be processed, and respectively constructing a plurality of structured prompter templates aiming at a plurality of joint information extraction tasks;
inputting the text data to be processed and the plurality of structured prompter templates into an information extraction model to obtain a plurality of structured languages respectively corresponding to the plurality of joint information extraction tasks;
the information extraction model is used for extracting structured languages respectively corresponding to the plurality of joint information extraction tasks from input text data based on prompt of the constructed structured prompter template.
11. An information extraction model training apparatus, the apparatus comprising:
the template construction unit is configured to construct a corresponding structured prompter template for each task in the plurality of joint information extraction tasks to obtain a plurality of structured prompter templates;
a training data acquisition unit configured to acquire first training sample data and second training sample data; the first training sample data comprises first text data and a first structured language label corresponding to the first text data, and the second training sample data comprises second text data and a second structured language label corresponding to the second text data;
A training unit configured to:
pre-training the reference model based on the first training sample data and the plurality of structured prompter templates to obtain a trained pre-training model;
performing fine tuning training on the pre-training model based on the second training sample data and the plurality of structured prompter templates to obtain a trained information extraction model;
the information extraction model is used for extracting structured languages respectively corresponding to the plurality of joint information extraction tasks from input text data based on prompt of the constructed structured prompter template.
12. An information extraction apparatus, the apparatus comprising:
the input unit is configured to acquire text data to be processed and a plurality of structured prompter templates respectively constructed for a plurality of joint information extraction tasks;
an information extraction unit configured to input the text data to be processed and the plurality of structured prompter templates into an information extraction model, obtain a plurality of structured languages respectively corresponding to the plurality of joint information extraction tasks,
the information extraction model is used for extracting structured languages respectively corresponding to the plurality of joint information extraction tasks from input text data based on prompt of the constructed structured prompter template.
13. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the information extraction model training method of any one of claims 1 to 9 and/or the information extraction method of claim 10.
14. A computer-readable storage medium, characterized in that instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the information extraction model training method of any one of claims 1 to 9 and/or the information extraction method of claim 10.
CN202210978578.9A 2022-08-16 2022-08-16 Information extraction model training method, information extraction method and device Pending CN116150613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210978578.9A CN116150613A (en) 2022-08-16 2022-08-16 Information extraction model training method, information extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210978578.9A CN116150613A (en) 2022-08-16 2022-08-16 Information extraction model training method, information extraction method and device

Publications (1)

Publication Number Publication Date
CN116150613A true CN116150613A (en) 2023-05-23

Family

ID=86349505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210978578.9A Pending CN116150613A (en) 2022-08-16 2022-08-16 Information extraction model training method, information extraction method and device

Country Status (1)

Country Link
CN (1) CN116150613A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172254A (en) * 2023-11-02 2023-12-05 成方金融科技有限公司 Model training method, information extraction method, device, equipment and storage medium
CN117787422A (en) * 2024-02-27 2024-03-29 四川金信石信息技术有限公司 Switching operation task extraction method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172254A (en) * 2023-11-02 2023-12-05 成方金融科技有限公司 Model training method, information extraction method, device, equipment and storage medium
CN117172254B (en) * 2023-11-02 2024-01-16 成方金融科技有限公司 Model training method, information extraction method, device, equipment and storage medium
CN117787422A (en) * 2024-02-27 2024-03-29 四川金信石信息技术有限公司 Switching operation task extraction method and system
CN117787422B (en) * 2024-02-27 2024-04-26 四川金信石信息技术有限公司 Switching operation task extraction method and system

Similar Documents

Publication Publication Date Title
Yuan et al. Can we automate scientific reviewing?
US11537793B2 (en) System for providing intelligent part of speech processing of complex natural language
US11138382B2 (en) Neural network system for text classification
US20210201013A1 (en) Contract lifecycle management
US9058317B1 (en) System and method for machine learning management
US8452772B1 (en) Methods, systems, and articles of manufacture for addressing popular topics in a socials sphere
US9690849B2 (en) Systems and methods for determining atypical language
CN116150613A (en) Information extraction model training method, information extraction method and device
Shaghaghian et al. Customizing contextualized language models for legal document reviews
KR102409667B1 (en) Method of building training data of machine translation
CN112966097A (en) NLP-based marketing company financial news-express automatic generation method and system
Wang et al. Data set and evaluation of automated construction of financial knowledge graph
Wan et al. CFERE: Multi-type Chinese financial event relation extraction
Hong et al. Knowledge-grounded dialogue modelling with dialogue-state tracking, domain tracking, and entity extraction
US20220100967A1 (en) Lifecycle management for customized natural language processing
CN117556010A (en) Knowledge base and large model-based document generation system, method, equipment and medium
Klimczak Text analysis in finance: The challenges for efficient application
CN116861242A (en) Language perception multi-language pre-training and fine tuning method based on language discrimination prompt
Kiršienė et al. Digital transformation of legal services and access to Justice: challenges and possibilities
Vetter et al. Enhancing the IBM Power Systems Platform with IBM Watson Services
Shaghaghian et al. Customizing Contextualized Language Models forLegal Document Reviews
Choi et al. The nature of Saemaul Undong as a rural development strategy: Topic modelling and text mining analysis
CN111797237B (en) Text entity relationship recognition method, system and medium
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
Li et al. Theme analyses for open-ended survey responses in education research on summer melt phenomenon

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination